Search Results

Search found 18 results on 1 pages for 'alabaster codify'.

Page 1/1 | 1 

  • Java parsing UTF8

    - by Jack
    I have the following issue with a UTF8 files structured as following: FIELD1§FIELD2§FIELD3§FIELD4 Looking at hexadecimal values of the file it uses A7 to codify §. So according to this codify it should be UTF8, but it's strange because A7 7F so 1 byte shouldn't be enough to codify §. So I tried using directly a BufferedReader with a specified charset: BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(input), utf8)) but when I try to tokenize the string with SmartTokenizer st = new SmartTokenizer(toTokenize, "§") (the SmartTokenizer is a modified version of the StringTokenizer that keeps empty tokens) no splitting occurs, and if I try to print the string I obtain FIELD1?FIELD2?FIELD3?... so § used in the file is different from the one specified as a the delimiter, and it's not able to print out it too. So what's the problem here? Maybe the original file should use 2 bytes to store §?

    Read the article

  • Changing POST data used by Apache Bench per iteration

    - by Alabaster Codify
    I'm using ab to do some load testing, and it's important that the supplied querystring (or POST) parameters change between requests. I.e. I need to make requests to URLs like: http://127.0.0.1:9080/meth?param=0 http://127.0.0.1:9080/meth?param=1 http://127.0.0.1:9080/meth?param=2 ... to properly exercise the application. ab seems to only read the supplied POST data file once, at startup, so changing its content during the test run is not an option. Any suggestions?

    Read the article

  • Code golf: combining multiple sorted lists into a single sorted list

    - by Alabaster Codify
    Implement an algorithm to merge an arbitrary number of sorted lists into one sorted list. The aim is to create the smallest working programme, in whatever language you like. For example: input: ((1, 4, 7), (2, 5, 8), (3, 6, 9)) output: (1, 2, 3, 4, 5, 6, 7, 8, 9) input: ((1, 10), (), (2, 5, 6, 7)) output: (1, 2, 5, 6, 7, 10) Note: solutions which concatenate the input lists then use a language-provided sort function are not in-keeping with the spirit of golf, and will not be accepted: sorted(sum(lists,[])) # cheating: out of bounds! Apart from anything else, your algorithm should be (but doesn't have to be) a lot faster! Clearly state the language, any foibles and the character count. Only include meaningful characters in the count, but feel free to add whitespace to the code for artistic / readability purposes. To keep things tidy, suggest improvement in comments or by editing answers where appropriate, rather than creating a new answer for each "revision". EDIT: if I was submitting this question again, I would expand on the "no language provided sort" rule to be "don't concatenate all the lists then sort the result". Existing entries which do concatenate-then-sort are actually very interesting and compact, so I won't retro-actively introduce a rule they break, but feel free to work to the more restrictive spec in new submissions. Inspired by http://stackoverflow.com/questions/464342/combining-two-sorted-lists-in-python

    Read the article

  • C++ : Lack of Standardization at the Binary Level

    - by Nawaz
    Why ISO/ANSI didn't standardize C++ at the binary level? There are many portability issues with C++, which is only because of lack of it's standardization at the binary level. Don Box writes, (quoting from his book Essential COM, chapter COM As A Better C++) C++ and Portability Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other than the one used to build the FastString DLL. Are there more benefits Or loss of this lack of binary standardization?

    Read the article

  • C++ : Lack of Standardization at the Binary Level

    - by Nawaz
    Why ISO/ANSI didn't standardize C++ at the binary level? There are many portability issues with C++, which is only because of lack of it's standardization at the binary level. Don Box writes, (quoting from his book Essential COM, chapter COM As A Better C++) C++ and Portability Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other than the one used to build the FastString DLL. Are there more benefits Or loss of this lack of binary standardization?

    Read the article

  • Flatten (an irregular) list of lists in Python

    - by telliott99
    Yes, I know this subject has been covered before (here, here, here, here), but AFAIK, all solutions save one choke on a list like this: L = [[[1, 2, 3], [4, 5]], 6] where the desired output is [1, 2, 3, 4, 5, 6] or perhaps even better, an iterator. The only solution I saw that works for an arbitrary nesting is from @Alabaster Codify here: def flatten(x): result = [] for el in x: if hasattr(el, "__iter__") and not isinstance(el, basestring): result.extend(flatten(el)) else: result.append(el) return result flatten(L) So to my question: is this the best model? Did I overlook something? Any problems?

    Read the article

  • Formalizing a requirements spec written in narrative English

    - by ProfK
    I have a fairly technical functionality requirements spec, expressed in English prose, produced by my project manager. It is structured as a collection of UI tabs, where the requirements for each tab are expressed as a lit of UI fields and a list of business rules for the tab. Most business rules are for UI fields on a tab, e.g: a) Must be alphanumeric, max length 20. b) Must be a dropdown, with values from table x. c) Is mandatory. d) Is mandatory under certain conditions, e.g. another field is just populated, or has a specific value. Then other business rules get a little more complex. The spec is for a job application, so the central business object (table) is the Applicant, and we have several other tables with one-to-many relationships with applicant, such as Degree, HighSchool, PreviousEmployer, Diploma, etc. e) One such complex rule says a status field can only be assigned a certain value if a many-side record exists in at least one of the many-side tables. E.g. the Applicant has at least one HighSchool or at least one Diploma record. I am looking for advice on how to codify these requirements into a more structured specification defined in terms of tables, fields, and relationships, especially for the conditional rules for fields and for the presence of related records. Any suggestions and advice will be most welcome, but I would be overjoyed if i could find an already defined system or structure for expressing things like this.

    Read the article

  • Codifying a natural language requirements spec

    - by ProfK
    I have a fairly technical functionality requirements spec, expressed in English prose, produced by my project manager. It is structured as a collection of UI tabs, where the requirements for each tab are expressed as a lit of UI fields and a list of business rules for the tab. Most business rules are for UI fields on a tab, e.g: a) Must be alphanumeric, max length 20. b) Must be a dropdown, with values from table x. c) Is mandatory. d) Is mandatory under certain conditions, e.g. another field is just populated, or has a specific value. Then other business rules get a little more complex. The spec is for a job application, so the central business object (table) is the Applicant, and we have several other tables with one-to-many relationships with applicant, such as Degree, HighSchool, PreviousEmployer, Diploma, etc. e) One such complex rule says a status field can only be assigned a certain value if a many-side record exists in at least one of the many-side tables. E.g. the Applicant has at least one HighSchool or at least one Diploma record. I am looking for advice on how to codify these requirements into a more structured specification defined in terms of tables, fields, and relationships, especially for the conditional rules for fields and for the presence of related records. Any suggestions and advice will be most welcome, but I would be overjoyed if i could find an already defined system or structure for expressing things like this.

    Read the article

  • Hidden Gems: Accelerating Oracle Data Integrator with SOA, Groovy, SDK, and XML

    - by Alex Kotopoulis
    On the last day of Oracle OpenWorld, we had a final advanced session on getting the most out of Oracle Data Integrator through the use of various advanced techniques. The primary way to improve your ODI processes is to choose the optimal knowledge modules for your load and take advantage of the optimized tools of your database, such as OracleDataPump and similar mechanisms in other databases. Knowledge modules also allow you to customize tasks, allowing you to codify best practices that are consistently applied by all integration developers. ODI SDK is another very powerful means to automate and speed up your integration development process. This allows you to automate Life Cycle Management, code comparison, repetitive code generation and change of your integration projects. The SDK is easily accessible through Java or scripting languages such as Groovy and Jython. Finally, all Oracle Data Integration products provide services that can be integrated into a larger Service Oriented Architecture. This moved data integration from an isolated environment into an agile part of a larger business process environment. All Oracle data integration products can play a part in thisracle GoldenGate can integrate into business event streams by processing JMS queues or publishing new events based on database transactions. Oracle GoldenGate can integrate into business event streams by processing JMS queues or publishing new events based on database transactions. Oracle Data Integrator allows full control of its runtime sessions through web services, so that integration jobs can become part of business processes. Oracle Data Service Integrator provides a data virtualization layer over your distributed sources, allowing unified reading and updating for heterogeneous data without replicating and moving data. Oracle Enterprise Data Quality provides data quality services to cleanse and deduplicate your records through web services.

    Read the article

  • C++ : Lack of Standardization at the Binary Level

    - by Nawaz
    Why ISO/ANSI didn't standardize C++ at the binary level? There are many portability issues with C++, which is only because of lack of it's standardization at the binary level. Don Box writes, (quoting from his book Essential COM, chapter COM As A Better C++) C++ and Portability Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other than the one used to build the FastString DLL. Are there more benefits Or loss of this lack of binary standardization?

    Read the article

  • Why do old programming languages continue to be revised?

    - by SunAvatar
    This question is not, "Why do people still use old programming languages?" I understand that quite well. In fact the two programming languages I know best are C and Scheme, both of which date back to the 70s. Recently I was reading about the changes in C99 and C11 versus C89 (which seems to still be the most-used version of C in practice and the version I learned from K&R). Looking around, it seems like every programming language in heavy use gets a new specification at least once per decade or so. Even Fortran is still getting new revisions, despite the fact that most people using it are still using FORTRAN 77. Contrast this with the approach of, say, the typesetting system TeX. In 1989, with the release of TeX 3.0, Donald Knuth declared that TeX was feature-complete and future releases would contain only bug fixes. Even beyond this, he has stated that upon his death, "all remaining bugs will become features" and absolutely no further updates will be made. Others are free to fork TeX and have done so, but the resulting systems are renamed to indicate that they are different from the official TeX. This is not because Knuth thinks TeX is perfect, but because he understands the value of a stable, predictable system that will do the same thing in fifty years that it does now. Why do most programming language designers not follow the same principle? Of course, when a language is relatively new, it makes sense that it will go through a period of rapid change before settling down. And no one can really object to minor changes that don't do much more than codify existing pseudo-standards or correct unintended readings. But when a language still seems to need improvement after ten or twenty years, why not just fork it or start over, rather than try to change what is already in use? If some people really want to do object-oriented programming in Fortran, why not create "Objective Fortran" for that purpose, and leave Fortran itself alone? I suppose one could say that, regardless of future revisions, C89 is already a standard and nothing stops people from continuing to use it. This is sort of true, but connotations do have consequences. GCC will, in pedantic mode, warn about syntax that is either deprecated or has a subtly different meaning in C99, which means C89 programmers can't just totally ignore the new standard. So there must be some benefit in C99 that is sufficient to impose this overhead on everyone who uses the language. This is a real question, not an invitation to argue. Obviously I do have an opinion on this, but at the moment I'm just trying to understand why this isn't just how things are done already. I suppose the question is: What are the (real or perceived) advantages of updating a language standard, as opposed to creating a new language based on the old?

    Read the article

  • Gartner PCC Follow-up: Interview with Chaeny Emanavin, Usability Lead - Office of Information Develo

    - by [email protected]
    Last week at the Gartner Portals, Content and Collaboration conference in Baltimore, Chaeny and I co-presented on Oracle Enterprise 2.0 and BIA's Citizen Portal. Chaeny's presentation about the BIA solution was very well received and I wanted to do a follow-up interview with Chaeny to discuss more details about their solution and its Enterprise 2.0 features. Ajay: What were the main objectives for the BIA Citizen Portal? Chaeny: The BIA Citizen Portal is designed to provide all the services of the Bureau of Indian Affairs to the community of 564 federally recognized tribes that include over 1.9 million American Indians and Alaska Natives. The BIA provides the same breadth of services that the entire U.S. Federal Government provides in one small Bureau. So, we needed a solution that was flexible enough to handle content ranging from law enforcement to housing to education. Key objectives for external users was to use the Web as a communications channel and keep them informed on what services are available. We also wanted to build an internal web presence and community for BIA's 5000 employees to ensure that they update their content, leverage internal experts and create single sources of truth for key policy documents. Ajay: How is the project being implemented? Chaeny: We are using a phased approach. In phases 1 & 2, interim internal and external sites were built to ensure usability and functional requirements are being met. In Phases 3 & 4, we built out a modern internal and external presence using Oracle WebCenter Suite and Oracle Universal Content Management (UCM), including enabling delegated content management for our internal business units. Phase 4 was completed in January 2010. Phase 5 will add deeper Enterprise 2.0 collaboration capabilities to the solution. Ajay: Are you integrating any existing sites into the new solution? Chaeny: Yes, we have a SharePoint implementation that we are using for document management. We needed more precise functionality however. We found that SharePoint would let individual administrators of a SharePoint site actually create new sites. In a 3 months span, we had over 200 new sites created and most were not being used. So, we had an enormous sprawl problem. Our requirements mandated increased governance and more granular control over the creation of sites and flexible user access to content. In SharePoint this required custom code and was very time-intensive which was unfeasible given our tight deadlines. We are piloting Oracle WebCenter Spaces as our collaboration solution to mitigate these issues. However, we must integrate our existing SharePoint investment which we can do easily by using the SharePoint connectors available in Oracle WebCenter and UCM. Ajay: What were the key design parameters for your solution? Chaeny: We wanted everything driven by standards and policies. We created a cross-functional steering group called the Indian Affairs Web Council to codify policies that were baked into the system. Other key design areas were focused on security/governance, self-service content management, ease of use, integration with legacy applications and seamless single sign-on. We are using Dublin Core as our metadata standard. We also are using Java, APEX, and ADF as our development standards. Ajay: Why was it important to standardize on a platform? Chaeny: We initially looked at best-of-breed solutions, but we faced a lot of issues getting the different solutions to work together. Going with an integrated solution was more economical, easier to learn and faster to deliver the solution. Ajay: What type of legacy applications are you integrating into the portal? Chaeny: Initially we are starting with administrative apps such as people directory and user admin and then we will integrate HR and Financial applications among others. Ajay: Can you describe some of the E20 collaboration features you are putting into the solution? Chaeny: We are adding Enterprise 2.0 using Oracle WebCenter Spaces to deliver different collaboration tools such as wikis, blogs and discussion forums. Wikis to create rapid, ad hoc monthly roll-up reports; discussion forums to provide context-specific help; blogs to capture tacit organization knowledge from experts, identify gurus and turn tacit knowledge into explicit knowledge. Ajay: Are you doing anything specifically to spur adoption and usage? Chaeny: Yes, we did several things that I think helped us ramp quickly. First, we met our commitments for the new system launch date and also provided extra resources for a customer support "hotline" during the launch period. Prior to launch, we did exhaustive usability studies to capture user requirements around functionality, navigation and other key interaction areas. We also created extensive training programs so that the content managers in each business unit were comfortable using the content management tools and knew the best practices for usage. Finally, to launch the Enterprise 2.0 collaboration capabilities, we are working with a pilot group from the Division of Forestry and Wildland Fire Management of BIA. This group of people in the past have been willing early adopters and they have a strong business need to collaborate with many agencies both internal and external across State, County and other Federal jurisdictions. Their feedback is key to helping us launch Enterprise 2.0 successfully in our broader organization. Ajay: What were the biggest benefits to internal BIA employees and to the external community of users? Chaeny: For our employees, the new Enterprise 2.0-based solution will make it easier to find information; enhance employee productivity by embedding standard business processes into the system and create more of a community by creating connections with experts via social collaboration to ultimately provide better services more quickly. For the external American Indian and Alaska Native communities, we have a better relationship with the users and the new site has improved BIA's perception as a more responsive and customer-centric organization.

    Read the article

  • That Escalated Quickly

    - by Jesse Taber
    Originally posted on: http://geekswithblogs.net/GruffCode/archive/2014/05/17/that-escalated-quickly.aspxI have been working remotely out of my home for over 4 years now. All of my coworkers during that time have also worked remotely. Lots of folks have written about the challenges inherent in facilitating communication on remote teams and strategies for overcoming them. A popular theme around this topic is the notion of “escalating communication”. In this context “escalating” means taking a conversation from one mode of communication to a different, higher fidelity mode of communication. Here are the five modes of communication I use at work in order of increasing fidelity: Email – This is the “lowest fidelity” mode of communication that I use. I usually only check it a few times a day (and I’m trying to check it even less frequently than that) and I only keep items in my inbox if they represent an item I need to take action on that I haven’t tracked anywhere else. Forums / Message boards – Being a developer, I’ve gotten into the habit of having other people look over my code before it becomes part of the product I’m working on. These code reviews often happen in “real time” via screen sharing, but I also always have someone else give all of the changes another look using pull requests. A pull request takes my code and lets someone else see the changes I’ve made side-by-side with the existing code so they can see if I did anything dumb. Pull requests can facilitate a conversation about the code changes in an online-forum like style. Some teams I’ve worked on also liked using tools like Trello or Google Groups to have on-going conversations about a topic or task that was being worked on. Chat & Instant Messaging  - Chat and instant messaging are the real workhorses for communication on the remote teams I’ve been a part of. I know some teams that are co-located that also use it pretty extensively for quick messages that don’t warrant walking across the office to talk with someone but reqire more immediacy than an e-mail. For the purposes of this post I think it’s important to note that the terms “chat” and “instant messaging” might insinuate that the conversation is happening in real time, but that’s not always true. Modern chat and IM applications maintain a searchable history so people can easily see what might have been discussed while they were away from their computers. Voice, Video and Screen sharing – Everyone’s got a camera and microphone on their computers now, and there are an abundance of services that will let you use them to talk to other people who have cameras and microphones on their computers. I’m including screen sharing here as well because, in my experience, these discussions typically involve one or more people showing the other participants something that’s happening on their screen. Obviously, this mode of communication is much higher-fidelity than any of the ones listed above. Scheduled meetings are typically conducted using this mode of communication. In Person – No matter how great communication tools become, there’s no substitute for meeting with someone face-to-face. However, opportunities for this kind of communcation are few and far between when you work on a remote team. When a conversation gets escalated that usually means it moves up one or more positions on this list. A lot of people advocate jumping to #4 sooner than later. Like them, I used to believe that, if it was possible, organizing a call with voice and video was automatically better than any kind of text-based communication could be. Lately, however, I’m becoming less convinced that escalating is always the right move. Working Asynchronously Last year I attended a talk at our local code camp given by Drew Miller. Drew works at GitHub and was talking about how they use GitHub internally. Many of the folks at GitHub work remotely, so communication was one of the main themes in Drew’s talk. During the talk Drew used the phrase, “asynchronous communication” to describe their use of chat and pull request comments. That phrase stuck in my head because I hadn’t heard it before but I think it perfectly describes the way in which remote teams often need to communicate. You don’t always know when your co-workers are at their computers or what hours (if any) they are working that day. In order to work this way you need to assume that the person you’re talking to might not respond right away. You can’t always afford to wait until everyone required is online and available to join a voice call, so you need to use text-based, persistent forms of communication so that people can receive and respond to messages when they are available. Going back to my list from the beginning of this post for a second, I characterize items #1-3 as being “asynchronous” modes of communication while we could call items #4 and #5 “synchronous”. When communication gets escalated it’s almost always moving from an asynchronous mode of communication to a synchronous one. Now, to the point of this post: I’ve become increasingly reluctant to escalate from asynchronous to synchronous communication for two primary reasons: 1 – You can often find a higher fidelity way to convey your message without holding a synchronous conversation 2 - Asynchronous modes of communication are (usually) persistent and searchable. You Don’t Have to Broadcast Live Let’s start with the first reason I’ve listed. A lot of times you feel like you need to escalate to synchronous communication because you’re having difficulty describing something that you’re seeing in words. You want to provide the people you’re conversing with some audio-visual aids to help them understand the point that you’re trying to make and you think that getting on Skype and sharing your screen with them is the best way to do that. Firing up a screen sharing session does work well, but you can usually accomplish the same thing in an asynchronous manner. For example, you could take a screenshot and annotate it with some text and drawings to illustrate what it is you’re seeing. If a screenshot won’t work, taking a short screen recording while your narrate over it and posting the video to your forum or chat system along with a text-based description of what’s in the recording that can be searched for later can be a great way to effectively communicate with your team asynchronously. I Said What?!? Now for the second reason I listed: most asynchronous modes of communication provide a transcript of what was said and what decisions might have been made during the conversation. There have been many occasions where I’ve used the search feature of my team’s chat application to find a conversation that happened several weeks or months ago to remember what was decided. Unfortunately, I think the benefits associated with the persistence of communicating asynchronously often get overlooked when people decide to escalate to a in-person meeting or voice/video call. I’m becoming much more reluctant to suggest a voice or video call if I suspect that it might lead to codifying some kind of design decision because everyone involved is going to hang up the call and immediately forget what was decided. I recognize that you can record and archive these types of interactions, but without being able to search them the recordings aren’t terribly useful. When and How To Escalate I don’t mean to imply that communicating via voice/video or in person is never a good idea. I probably jump on a Skype call with a co-worker at least once a day to quickly hash something out or show them a bit of code that I’m working on. Also, meeting in person periodically is really important for remote teams. There’s no way around the fact that sometimes it’s easier to jump on a call and show someone my screen so they can see what I’m seeing. So when is it right to escalate? I think the simplest way to answer that is when the communication starts to feel painful. Everyone’s tolerance for that pain is different, but I think you need to let it hurt a little bit before jumping to synchronous communication. When you do escalate from asynchronous to synchronous communication, there are a couple of things you can do to maximize the effectiveness of the communication: Takes notes – This is huge and yet I’ve found that a lot of teams don’t do this. If you’re holding a meeting with  > 2 people you should have someone taking notes. Taking notes while participating in a meeting can be difficult but there are a few strategies to deal with this challenge that probably deserve a short post of their own. After the meeting, make sure the notes are posted to a place where all concerned parties (including those that might not have attended the meeting) can review and search them. Persist decisions made ASAP – If any decisions were made during the meeting, persist those decisions to a searchable medium as soon as possible following the conversation. All the teams I’ve worked on used a web-based system for tracking the on-going work and a backlog of work to be done in the future. I always try to make sure that all of the cards/stories/tasks/whatever in these systems always reflect the latest decisions that were made as the work was being planned and executed. If held a quick call with your team lead and decided that it wasn’t worth the effort to build real-time validation into that new UI you were working on, go and codify that decision in the story associated with that work immediately after you hang up. Even better, write it up in the story while you are both still on the phone. That way when the folks from your QA team pick up the story to test a few days later they’ll know why the real-time validation isn’t there without having to invoke yet another conversation about the work. Communicating Well is Hard At this point you might be thinking that communicating asynchronously is more difficult than having a live conversation. You’re right: it is more difficult. In order to communicate effectively this way you need to very carefully think about the message that you’re trying to convey and craft it in a way that’s easy for your audience to understand. This is almost always harder than just talking through a problem in real time with someone; this is why escalating communication is such a popular idea. Why wouldn’t we want to do the thing that’s easier? Easier isn’t always better. If you and your team can get in the habit of communicating effectively in an asynchronous manner you’ll find that, over time, all of your communications get less painful because you don’t need to re-iterate previously made points over and over again. If you communicate right the first time, you often don’t need to rehash old conversations because you can go back and find the decisions that were made laid out in plain language. You’ll also find that you get better at doing things like writing useful comments in your code, creating written documentation about how the feature that you just built works, or persuading your team to do things in a certain way.

    Read the article

  • Building Simple Workflows in Oozie

    - by dan.mcclary
    Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

    Read the article

  • Laissez les bon temps rouler! (Microsoft BI Conference 2010)

    - by smisner
    "Laissez les bons temps rouler" is a Cajun phrase that I heard frequently when I lived in New Orleans in the mid-1990s. It means "Let the good times roll!" and encapsulates a feeling of happy expectation. As I met with many of my peers and new acquaintances at the Microsoft BI Conference last week, this phrase kept running through my mind as people spoke about their plans in their respective businesses, the benefits and opportunities that the recent releases in the BI stack are providing, and their expectations about the future of the BI stack. Notwithstanding some jabs here and there to point out the platform is neither perfect now nor will be anytime soon (along with admissions that the competitors are also not perfect), and notwithstanding several missteps by the event organizers (which I don't care to enumerate), the overarching mood at the conference was positive. It was a refreshing change from the doom and gloom hovering over several conferences that I attended in 2009. Although many people expect economic hardships to continue over the coming year or so, everyone I know in the BI field is busier than ever and expects to stay busy for quite a while. Self-Service BI Self-service was definitely a theme of the BI conference. In the keynote, Ted Kummert opened with a look back to a fairy tale vision of self-service BI that he told in 2008. At that time, the fairy tale future was a time when "every end user was able to use BI technologies within their job in order to move forward more effectively" and transitioned to the present time in which SQL Server 2008 R2, Office 2010, and SharePoint 2010 are available to deliver managed self-service BI. This set of technologies is presumably poised to address the needs of the 80% of users that Kummert said do not use BI today. He proceeded to outline a series of activities that users ought to be able to do themselves--from simple changes to a report like formatting or an addtional data visualization to integration of an additional data source. The keynote then continued with a series of demonstrations of both current and future technology in support of self-service BI. Some highlights that interested me: PowerPivot, of course, is the flagship product for self-service BI in the Microsoft BI stack. In the TechEd keynote, which was open to the BI conference attendees, Amir Netz (twitter) impressed the audience by demonstrating interactivity with a workbook containing 100 million rows. He upped the ante at the BI keynote with his demonstration of a future-state PowerPivot workbook containing over 2 billion records. It's important to note that this volume of data is being processed by a server engine, and not in the PowerPivot client engine. (Yes, I think it's impressive, but none of my clients are typically wrangling with 2 billion records at a time. Maybe they're thinking too small. This ability to work quickly with large data sets has greater implications for BI solutions than for self-service BI, in my opinion.) Amir also demonstrated KPIs for the future PowerPivot, which appeared to be easier to implement than in any other Microsoft product that supports KPIs, apart from simple KPIs in SharePoint. (My initial reaction is that we have one more place to build KPIs. Great. It's confusing enough. I haven't seen how well those KPIs integrate with other BI tools, which will be important for adoption.) One more PowerPivot feature that Amir showed was a graphical display of the lineage for calculations. (This is hugely practical, especially if you build up calculations incrementally. You can more easily follow the logic from calculation to calculation. Furthermore, if you need to make a change to one calculation, you can assess the impact on other calculations.) Another product demonstration will be available within the next 30 days--Pivot for Reporting Services. If you haven't seen this technology yet, check it out at www.getpivot.com. (It definitely has a wow factor, but I'm skeptical about its practicality. However, I'm looking forward to trying it out with data that I understand.) Michael Tejedor (twitter) demonstrated a feature that I think is really interesting and not emphasized nearly enough--overshadowed by PowerPivot, no doubt. That feature is the Microsoft Business Intelligence Indexing Connector, which enables search of the content of Excel workbooks and Reporting Services reports. (This capability existed in MOSS 2007, but was more cumbersome to implement. The search results in SharePoint 2010 are not only cooler, but more useful by describing whether the content is found in a table or a chart, for example.) This may yet be the dawning of the age of self-service BI - a phrase I've heard repeated from time to time over the last decade - but I think BI professionals are likely to stay busy for a long while, and need not start looking for a new line of work. Kummert repeatedly referenced strategic BI solutions in contrast to self-service BI to emphasize that self-service BI is not a replacement for the services that BI professionals provide. After all, self-service BI does not appear magically on user desktops (or whatever device they want to use). A supporting infrastructure is necessary, and grows in complexity in proportion to the need to simplify BI for users. It's one thing to hear the party line touted by Microsoft employees at the BI keynote, but it's another to hear from the people who are responsible for implementing and supporting it within an organization. Rob Collie (blog | twitter), Kasper de Jonge (blog | twitter), Vidas Matelis (site | twitter), and I were invited to join Andrew Brust (blog | twitter) as he led a Birds of a Feather session at TechEd entitled "PowerPivot: Is It the BI Deal-Changer for Developers and IT Pros?" I would single out the prevailing concern in this session as the issue of control. On one side of this issue were those who were concerned that they would lose control once PowerPivot is implemented. On the other side were those who believed that data should be freely accessible to users in PowerPivot, and even acknowledgment that users would get the data they want even if it meant they would have to manually enter into a workbook to have it ready for analysis. For another viewpoint on how PowerPivot played out at the conference, see Rob Collie's observations. Collaborative BI I have been intrigued by the notion of collaborative BI for a very long time. Before I discovered BI, I was a Lotus Notes developer and later a manager of developers, working in a software company that enabled collaboration in the legal industry. Not only did I help create collaborative systems for our clients, I created a complete project management from the ground up to collaboratively manage our custom development work. In that case, collaboration involved my team, my client contacts, and me. I was also able to produce my own BI from that system as well, but didn't know that's what I was doing at the time. Only in recent years has SharePoint begun to catch up with the capabilities that I had with Lotus Notes more than a decade ago. Eventually, I had the opportunity at that job to formally investigate BI as another product offering for our software, and the rest - as they say - is history. I built my first data warehouse with Scott Cameron (who has also ventured into the authoring world by writing Analysis Services 2008 Step by Step and was at the BI Conference last week where I got to reminisce with him for a bit) and that began a career that I never imagined at the time. Fast forward to 2010, and I'm still lauding the virtues of collaborative BI, if only the tools will catch up to my vision! Thus, I was anxious to see what Donald Farmer (blog | twitter) and Rita Sallam of Gartner had to say on the subject in their session "Collaborative Decision Making." As I suspected, the tools aren't quite there yet, but the vendors are moving in the right direction. One thing I liked about this session was a non-Microsoft perspective of the state of the industry with regard to collaborative BI. In addition, this session included a better demonstration of SharePoint collaborative BI capabilities than appeared in the BI keynote. Check out the video in the link to the session to see the demonstration. One of the use cases that was demonstrated was linking from information to a person, because, as Donald put it, "People don't trust data, they trust people." The Microsoft BI Stack in General A question I hear all the time from students when I'm teaching is how to know what tools to use when there is overlap between products in the BI stack. I've never taken the time to codify my thoughts on the subject, but saw that my friend Dan Bulos provided good insight on this topic from a variety of perspectives in his session, "So Many BI Tools, So Little Time." I thought one of his best points was that ideally you should be able to design in your tool of choice, and then deploy to your tool of choice. Unfortunately, the ideal is yet to become real across the platform. The closest we come is with the RDL in Reporting Services which can be produced from two different tools (Report Builder or Business Intelligence Development Studio's Report Designer), manually, or by a third-party or custom application. I have touted the idea for years (and publicly said so about 5 years ago) that eventually more products would be RDL producers or consumers, but we aren't there yet. Maybe in another 5 years. Another interesting session that covered the BI stack against a backdrop of competitive products was delivered by Andrew Brust. Andrew did a marvelous job of consolidating a lot of information in a way that clearly communicated how various vendors' offerings compared to the Microsoft BI stack. He also made a particularly compelling argument about how the existence of an ecosystem around the Microsoft BI stack provided innovation and opportunities lacking for other vendors. Check out his presentation, "How Does the Microsoft BI Stack...Stack Up?" Expo Hall I had planned to spend more time in the Expo Hall to see who was doing new things with the BI stack, but didn't manage to get very far. Each time I set out on an exploratory mission, I got caught up in some fascinating conversations with one or more of my peers. I find interacting with people that I meet at conferences just as important as attending sessions to learn something new. There were a couple of items that really caught me eye, however, that I'll share here. Pragmatic Works. Whether you develop SSIS packages, build SSAS cubes, or author SSRS reports (or all of the above), you really must take a look at BI Documenter. Brian Knight (twitter) walked me through the key features, and I must say I was impressed. Once you've seen what this product can do, you won't want to document your BI projects any other way. You can download a free single-user database edition, or choose from more feature-rich standard or professional editions. Microsoft Press ebooks. I also stopped by the O'Reilly Media booth to meet some folks that one of my acquisitions editors at Microsoft Press recommended. In case you haven't heard, Microsoft Press has partnered with O'Reilly Media for distribution and publishing. Apart from my interest in learning more about O'Reilly Media as an author, an advertisement in their booth caught me eye which I think is a really great move. When you buy Microsoft Press ebooks through the O'Reilly web site, you can receive it in any (or all) of the following formats where possible: PDF, epub, .mobi for Kindle and .apk for Android. You also have lifetime DRM-free access to the ebooks. As someone who is an avid collector of books, I fnd myself running out of room for storage. In addition, I travel a lot, and it's hard to lug my reference library with me. Today's e-reader options make the move to digital books a more viable way to grow my library. Having a variety of formats means I am not limited to a single device, and lifetime access means I don't have to worry about keeping track of where I've stored my files. Because the e-books are DRM-free, I can copy and paste when I'm compiling notes, and I can print pages when necessary. That's a winning combination in my mind! Overall, I was pleased with the BI conference. There were many more sessions that I couldn't attend, either because the room was full when I got there or there were multiple sessions running concurrently that I wanted to see. Fortunately, many of the sessions are accessible for viewing online at http://www.msteched.com/2010/NorthAmerica along with the TechEd sessions. You can spot the BI sessions by the yellow skyline on the title slide of the presentation as shown below. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Laissez les bon temps rouler! (Microsoft BI Conference 2010)

    - by smisner
    Laissez les bons temps rouler" is a Cajun phrase that I heard frequently when I lived in New Orleans in the mid-1990s. It means "Let the good times roll!" and encapsulates a feeling of happy expectation. As I met with many of my peers and new acquaintances at the Microsoft BI Conference last week, this phrase kept running through my mind as people spoke about their plans in their respective businesses, the benefits and opportunities that the recent releases in the BI stack are providing, and their expectations about the future of the BI stack.Notwithstanding some jabs here and there to point out the platform is neither perfect now nor will be anytime soon (along with admissions that the competitors are also not perfect), and notwithstanding several missteps by the event organizers (which I don't care to enumerate), the overarching mood at the conference was positive. It was a refreshing change from the doom and gloom hovering over several conferences that I attended in 2009. Although many people expect economic hardships to continue over the coming year or so, everyone I know in the BI field is busier than ever and expects to stay busy for quite a while.Self-Service BISelf-service was definitely a theme of the BI conference. In the keynote, Ted Kummert opened with a look back to a fairy tale vision of self-service BI that he told in 2008. At that time, the fairy tale future was a time when "every end user was able to use BI technologies within their job in order to move forward more effectively" and transitioned to the present time in which SQL Server 2008 R2, Office 2010, and SharePoint 2010 are available to deliver managed self-service BI.This set of technologies is presumably poised to address the needs of the 80% of users that Kummert said do not use BI today. He proceeded to outline a series of activities that users ought to be able to do themselves--from simple changes to a report like formatting or an addtional data visualization to integration of an additional data source. The keynote then continued with a series of demonstrations of both current and future technology in support of self-service BI. Some highlights that interested me:PowerPivot, of course, is the flagship product for self-service BI in the Microsoft BI stack. In the TechEd keynote, which was open to the BI conference attendees, Amir Netz (twitter) impressed the audience by demonstrating interactivity with a workbook containing 100 million rows. He upped the ante at the BI keynote with his demonstration of a future-state PowerPivot workbook containing over 2 billion records. It's important to note that this volume of data is being processed by a server engine, and not in the PowerPivot client engine. (Yes, I think it's impressive, but none of my clients are typically wrangling with 2 billion records at a time. Maybe they're thinking too small. This ability to work quickly with large data sets has greater implications for BI solutions than for self-service BI, in my opinion.)Amir also demonstrated KPIs for the future PowerPivot, which appeared to be easier to implement than in any other Microsoft product that supports KPIs, apart from simple KPIs in SharePoint. (My initial reaction is that we have one more place to build KPIs. Great. It's confusing enough. I haven't seen how well those KPIs integrate with other BI tools, which will be important for adoption.)One more PowerPivot feature that Amir showed was a graphical display of the lineage for calculations. (This is hugely practical, especially if you build up calculations incrementally. You can more easily follow the logic from calculation to calculation. Furthermore, if you need to make a change to one calculation, you can assess the impact on other calculations.)Another product demonstration will be available within the next 30 days--Pivot for Reporting Services. If you haven't seen this technology yet, check it out at www.getpivot.com. (It definitely has a wow factor, but I'm skeptical about its practicality. However, I'm looking forward to trying it out with data that I understand.)Michael Tejedor (twitter) demonstrated a feature that I think is really interesting and not emphasized nearly enough--overshadowed by PowerPivot, no doubt. That feature is the Microsoft Business Intelligence Indexing Connector, which enables search of the content of Excel workbooks and Reporting Services reports. (This capability existed in MOSS 2007, but was more cumbersome to implement. The search results in SharePoint 2010 are not only cooler, but more useful by describing whether the content is found in a table or a chart, for example.)This may yet be the dawning of the age of self-service BI - a phrase I've heard repeated from time to time over the last decade - but I think BI professionals are likely to stay busy for a long while, and need not start looking for a new line of work. Kummert repeatedly referenced strategic BI solutions in contrast to self-service BI to emphasize that self-service BI is not a replacement for the services that BI professionals provide. After all, self-service BI does not appear magically on user desktops (or whatever device they want to use). A supporting infrastructure is necessary, and grows in complexity in proportion to the need to simplify BI for users.It's one thing to hear the party line touted by Microsoft employees at the BI keynote, but it's another to hear from the people who are responsible for implementing and supporting it within an organization. Rob Collie (blog | twitter), Kasper de Jonge (blog | twitter), Vidas Matelis (site | twitter), and I were invited to join Andrew Brust (blog | twitter) as he led a Birds of a Feather session at TechEd entitled "PowerPivot: Is It the BI Deal-Changer for Developers and IT Pros?" I would single out the prevailing concern in this session as the issue of control. On one side of this issue were those who were concerned that they would lose control once PowerPivot is implemented. On the other side were those who believed that data should be freely accessible to users in PowerPivot, and even acknowledgment that users would get the data they want even if it meant they would have to manually enter into a workbook to have it ready for analysis. For another viewpoint on how PowerPivot played out at the conference, see Rob Collie's observations.Collaborative BII have been intrigued by the notion of collaborative BI for a very long time. Before I discovered BI, I was a Lotus Notes developer and later a manager of developers, working in a software company that enabled collaboration in the legal industry. Not only did I help create collaborative systems for our clients, I created a complete project management from the ground up to collaboratively manage our custom development work. In that case, collaboration involved my team, my client contacts, and me. I was also able to produce my own BI from that system as well, but didn't know that's what I was doing at the time. Only in recent years has SharePoint begun to catch up with the capabilities that I had with Lotus Notes more than a decade ago. Eventually, I had the opportunity at that job to formally investigate BI as another product offering for our software, and the rest - as they say - is history. I built my first data warehouse with Scott Cameron (who has also ventured into the authoring world by writing Analysis Services 2008 Step by Step and was at the BI Conference last week where I got to reminisce with him for a bit) and that began a career that I never imagined at the time.Fast forward to 2010, and I'm still lauding the virtues of collaborative BI, if only the tools will catch up to my vision! Thus, I was anxious to see what Donald Farmer (blog | twitter) and Rita Sallam of Gartner had to say on the subject in their session "Collaborative Decision Making." As I suspected, the tools aren't quite there yet, but the vendors are moving in the right direction. One thing I liked about this session was a non-Microsoft perspective of the state of the industry with regard to collaborative BI. In addition, this session included a better demonstration of SharePoint collaborative BI capabilities than appeared in the BI keynote. Check out the video in the link to the session to see the demonstration. One of the use cases that was demonstrated was linking from information to a person, because, as Donald put it, "People don't trust data, they trust people."The Microsoft BI Stack in GeneralA question I hear all the time from students when I'm teaching is how to know what tools to use when there is overlap between products in the BI stack. I've never taken the time to codify my thoughts on the subject, but saw that my friend Dan Bulos provided good insight on this topic from a variety of perspectives in his session, "So Many BI Tools, So Little Time." I thought one of his best points was that ideally you should be able to design in your tool of choice, and then deploy to your tool of choice. Unfortunately, the ideal is yet to become real across the platform. The closest we come is with the RDL in Reporting Services which can be produced from two different tools (Report Builder or Business Intelligence Development Studio's Report Designer), manually, or by a third-party or custom application. I have touted the idea for years (and publicly said so about 5 years ago) that eventually more products would be RDL producers or consumers, but we aren't there yet. Maybe in another 5 years.Another interesting session that covered the BI stack against a backdrop of competitive products was delivered by Andrew Brust. Andrew did a marvelous job of consolidating a lot of information in a way that clearly communicated how various vendors' offerings compared to the Microsoft BI stack. He also made a particularly compelling argument about how the existence of an ecosystem around the Microsoft BI stack provided innovation and opportunities lacking for other vendors. Check out his presentation, "How Does the Microsoft BI Stack...Stack Up?"Expo HallI had planned to spend more time in the Expo Hall to see who was doing new things with the BI stack, but didn't manage to get very far. Each time I set out on an exploratory mission, I got caught up in some fascinating conversations with one or more of my peers. I find interacting with people that I meet at conferences just as important as attending sessions to learn something new. There were a couple of items that really caught me eye, however, that I'll share here.Pragmatic Works. Whether you develop SSIS packages, build SSAS cubes, or author SSRS reports (or all of the above), you really must take a look at BI Documenter. Brian Knight (twitter) walked me through the key features, and I must say I was impressed. Once you've seen what this product can do, you won't want to document your BI projects any other way. You can download a free single-user database edition, or choose from more feature-rich standard or professional editions.Microsoft Press ebooks. I also stopped by the O'Reilly Media booth to meet some folks that one of my acquisitions editors at Microsoft Press recommended. In case you haven't heard, Microsoft Press has partnered with O'Reilly Media for distribution and publishing. Apart from my interest in learning more about O'Reilly Media as an author, an advertisement in their booth caught me eye which I think is a really great move. When you buy Microsoft Press ebooks through the O'Reilly web site, you can receive it in any (or all) of the following formats where possible: PDF, epub, .mobi for Kindle and .apk for Android. You also have lifetime DRM-free access to the ebooks. As someone who is an avid collector of books, I fnd myself running out of room for storage. In addition, I travel a lot, and it's hard to lug my reference library with me. Today's e-reader options make the move to digital books a more viable way to grow my library. Having a variety of formats means I am not limited to a single device, and lifetime access means I don't have to worry about keeping track of where I've stored my files. Because the e-books are DRM-free, I can copy and paste when I'm compiling notes, and I can print pages when necessary. That's a winning combination in my mind!Overall, I was pleased with the BI conference. There were many more sessions that I couldn't attend, either because the room was full when I got there or there were multiple sessions running concurrently that I wanted to see. Fortunately, many of the sessions are accessible for viewing online at http://www.msteched.com/2010/NorthAmerica along with the TechEd sessions. You can spot the BI sessions by the yellow skyline on the title slide of the presentation as shown below. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • PASS: Bylaw Changes

    - by Bill Graziano
    While you’re reading this, a post should be going up on the PASS blog on the plans to change our bylaws.  You should be able to find our old bylaws, our proposed bylaws and a red-lined version of the changes.  We plan to listen to feedback until March 31st.  At that point we’ll decide whether to vote on these changes or take other action. The executive summary is that we’re adding a restriction to prevent more than two people from the same company on the Board and eliminating the Board’s Officer Appointment Committee to have Officers directly elected by the Board.  This second change better matches how officer elections have been conducted in the past. The Gritty Details Our scope was to change bylaws to match how PASS actually works and tackle a limited set of issues.  Changing the bylaws is hard.  We’ve been working on these changes since the March board meeting last year.  At that meeting we met and talked through the issues we wanted to address.  In years past the Board has tried to come up with language and then we’ve discussed and negotiated to get to the result.  In March, we gave HQ guidance on what we wanted and asked them to come up with a starting point.  Hannes worked on building us an initial set of changes that we could work our way through.  Discussing changes like this over email is difficult wasn’t very productive.  We do a much better job on this at the in-person Board meetings.  Unfortunately there are only 2 or 3 of those a year. In August we met in Nashville and spent time discussing the changes.  That was also the day after we released the slate for the 2010 election. The discussion around that colored what we talked about in terms of these changes.  We talked very briefly at the Summit and again reviewed and revised the changes at the Board meeting in January.  This is the result of those changes and discussions. We made numerous small changes to clean up language and make wording more clear.  We also made two big changes. Director Employment Restrictions The first is that only two people from the same company can serve on the Board at the same time.  The actual language in section VI.3 reads: A maximum of two (2) Directors who are employed by, or who are joint owners or partners in, the same for-profit venture, company, organization, or other legal entity, may concurrently serve on the PASS Board of Directors at any time. The definition of “employed” is at the sole discretion of the Board. And what a mess this turns out to be in practice.  Our membership is a hodgepodge of interlocking relationships.  Let’s say three Board members get together and start a blog service for SQL Server bloggers.  It’s technically for-profit.  Let’s assume it makes $8 in the first year.  Does that trigger this clause?  (Technically yes.)  We had a horrible time trying to write language that covered everything.  All the sample bylaws that we found were just as vague as this. That led to the third clause in this section.  The first sentence reads: The Board of Directors reserves the right, strictly on a case-by-case basis, to overrule the requirements of Section VI.3 by majority decision for any single Director’s conflict of employment. We needed some way to handle the trivial issues and exercise some judgment.  It seems like a public vote is the best way.  This discloses the relationship and gets each Board member on record on the issue.   In practice I think this clause will rarely be used.  I think this entire section will only be invoked for actual employment issues and not for small side projects.  In either case we have the mechanisms in place to handle it in a public, transparent way. That’s the first and third clauses.  The second clause says that if your situation changes and you fall afoul of this restriction you need to notify the Board.  The clause further states that if this new job means a Board members violates the “two-per-company” rule the Board may request their resignation.  The Board can also  allow the person to continue serving with a majority vote.  I think this will also take some judgment.  Consider a person switching jobs that leads to three people from the same company.  I’m very likely to ask for someone to resign if all three are two weeks into a two year term.  I’m unlikely to ask anyone to resign if one is two weeks away from ending their term.  In either case, the decision will be a public vote that we can be held accountable for. One concern that was raised was whether this would affect someone choosing to accept a job.  I think that’s a choice for them to make.  PASS is clearly stating its intent that only two directors from any one organization should serve at any time.  Once these bylaws are approved, this policy should not come as a surprise to any potential or current Board members considering a job change.  This clause isn’t perfect.  The biggest hole is business relationships that aren’t defined above.  Let’s say that two employees from company “X” serve on the Board.  What happens if I accept a full-time consulting contract with that company?  Let’s assume I’m working directly for one of the two existing Board members.  That doesn’t violate section VI.3.  But I think it’s clearly the kind of relationship we’d like to prevent.  Unfortunately that was even harder to write than what we have now.  I fully expect that in the next revision of the bylaws we’ll address this.  It just didn’t make it into this one. Officer Elections The officer election process received a slightly different rewrite.  Our goal was to codify in the bylaws the actual process we used to elect the officers.  The officers are the President, Executive Vice-President (EVP) and Vice-President of Marketing.  The Immediate Past President (IPP) is also an officer but isn’t elected.  The IPP serves in that role for two years after completing their term as President.  We do that for continuity’s sake.  Some organizations have a President-elect that serves for one or two years.  The group that founded PASS chose to have an IPP. When I started on the Board, the Nominating Committee (NomCom) selected the slate for the at-large directors and the slate for the officers.  There was always one candidate for each officer position.  It wasn’t really an election so much as the NomCom decided who the next person would be for each officer position.  Behind the scenes the Board worked to select the best people for the role. In June 2009 that process was changed to bring it line with what actually happens.  An Officer Appointment Committee was created that was a subset of the Board.  That committee would take time to interview the candidates and present a slate to the Board for approval.  The majority vote of the Board would determine the officers for the next two years.  In practice the Board itself interviewed the candidates and conducted the elections.  That means it was time to change the bylaws again. Section VII.2 and VII.3 spell out the process used to select the officers.  We use the phrase “Officer Appointment” to separate it from the Director election but the end result is that the Board elects the officers.  Section VII.3 starts: Officers shall be appointed bi-annually by a majority of all the voting members of the Board of Directors. Everything else revolves around that sentence.  We use the word appoint but they truly are elected.  There are details in the bylaws for term limits, minimum requirements for President (1 prior term as an officer), tie breakers and filling vacancies. In practice we will have an election for President, then an election for EVP and then an election for VP Marketing.  That means that losing candidates will be able to fall down the ladder and run for the next open position.  Another point to note is that officers aren’t at-large directors.  That means if a current sitting officer loses all three elections they are off the Board.  Having Board member votes public will help with the transparency of this approach. This process has a number of positive and negatives.  The biggest concern I expect to hear is that our members don’t directly choose the officers.  I’m going to try and list all the positives and negatives of this approach. Many non-profits value continuity and are slower to change than a business.  On the plus side this promotes that.  On the negative side this promotes that.  If we change too slowly the members complain that we aren’t responsive.  If we change too quickly we make mistakes and fail at various things.  We’ve been criticized for both of those lately so I’m not entirely sure where to draw the line.  My rough assumption to this point is that we’re going too slow on governance and too quickly on becoming “more than a Summit.”  This approach creates competition in the officer elections.  If you are an at-large director there is no consequence to losing an election.  If you are an officer the only way to stay on the Board is to win an officer election or an at-large election.  If you are an officer and lose an election you can always run for the next office down.  This makes it very easy for multiple people to contest an election. There is value in a person moving through the officer positions up to the Presidency.  Having the Board select the officers promotes this.  The down side is that it takes a LOT of time to get to the Presidency.  We’ve had good people struggle with burnout.  We’ve had lots of discussion around this.  The process as we’ve described it here makes it possible for someone to move quickly through the ranks but doesn’t prevent people from working their way up through each role. We talked long and hard about having the officers elected by the members.  We had a self-imposed deadline to complete these changes prior to elections this summer. The other challenge was that our original goal was to make the bylaws reflect our actual process rather than create a new one.  I believe we accomplished this goal. We ran out of time to consider this option in the detail it needs.  Having member elections for officers needs a number of problems solved.  We would need a way for candidates to fall through the election.  This is what promotes competition.  Without this few people would risk an election and we’ll be back to one candidate per slot.  We need to do this without having multiple elections.  We may be able to copy what other organizations are doing but I was surprised at how little I could find on other organizations.  We also need a way for people that lose an officer election to win an at-large election.  Otherwise we’ll have very little competition for officers. This brings me to an area that I think we as a Board haven’t done a good job.  We haven’t built a strong process to tell you who is doing a good job and who isn’t.  This is a double-edged sword.  I don’t want to highlight Board members that are failing.  That’s not a good way to get people to volunteer and run for the Board.  But I also need a way let the members make an informed choice about who is doing a good job and would make a good officer.  Encouraging Board members to blog, publishing minutes and making votes public helps in that regard but isn’t the final answer.  I don’t know what the final answer is yet.  I do know that the Board members themselves are uniquely positioned to know which other Board members are doing good work.  They know who speaks up in meetings, who works to build consensus, who has good ideas and who works with the members.  What I Could Do Better I’ve learned a lot writing this about how we communicated with our members.  The next time we revise the bylaws I’d do a few things differently.  The biggest change would be to provide better documentation.  The March 2009 minutes provide a very detailed look into what changes we wanted to make to the bylaws.  Looking back, I’m a little surprised at how closely they matched our final changes and covered the various arguments.  If you just read those you’d get 90% of what we eventually changed.  Nearly everything else was just details around implementation.  I’d also consider publishing a scope document defining exactly what we were doing any why.  I think it really helped that we had a limited, defined goal in mind.  I don’t think we did a good job communicating that goal outside the meeting minutes though. That said, I wish I’d blogged more after the August and January meeting.  I think it would have helped more people to know that this change was coming and to be ready for it. Conclusion These changes address two big concerns that the Board had.  First, it prevents a single organization from dominating the Board.  Second, it codifies and clearly spells out how officers are elected.  This is the process that was previously followed but it was somewhat murky.  These changes bring clarity to this and clearly explain the process the Board will follow. We’re going to listen to feedback until March 31st.  At that time we’ll decide whether to approve these changes.  I’m also assuming that we’ll start another round of changes in the next year or two.  Are there other issues in the bylaws that we should tackle in the future?

    Read the article

  • CodePlex Daily Summary for Wednesday, July 31, 2013

    CodePlex Daily Summary for Wednesday, July 31, 2013Popular ReleasesSharePoint 2010 Export User Information to Text file, SQL server: Full Source code of the project: Please change SharePoint server address. I have used my sharepoint server -> http://sneakpreviewWINJS CTK (Control Toolkit): Initial Release: Initial release supports the following. Expander NumericBoxXMLPreprocess: 2.0.18: What's new in this release For XML Spreadsheet 2003 format, used the frozen row at the top of the worksheet to indicate the beginning of the values. This prevents you from having to start your values at row 7. This can be overridden with the /firstValueRow (or /vr) argument. Issues Fixed Fix for Issue 13006 : Corrected default treatment of the value "false" Beginning with this version, the FixFalse behavior that has caused confusion to so many, has hopefully been addressed in a way that st...MVVM.EventToCommand: Tommy.MVVM.EventToCommand: Tommy.MVVM.EventToCommand is a free, open source developer focused event2command via WinRT for MVVM Pattern.MVC Generator: MVC Generator Visual Studio Addin: This is the latest build, this includes the MVCGenerator.dll, and Visual Studio Addin file. See the home page of this project for installation instructions.Project Nonnon: 2013_07_30: ----------==========----------==========----------==========---------- "No news is good news." ----------==========----------==========----------==========---------- Change Log 2013/07/30 BUGFIX game/sound/directsound.c n_directsound_loop() OLD : crash when DirectSound is not supported NEW : fixed game/sound/waveout.c when included OLD : compile error NEW : fixed game/chara.c when included OLD : compile error NEW : fixed game/direct2d.c when included ...Dynamics CRM 2011 EasyPlugins: EasyPlugins-1.2.3.1-managed: V1.2.3.1 - Bug Fix : Condition expression is not saved on action creation - Bug Fix : Request Param with single attribute result - Some style changes v1.2.3.0 - Bug Fix : Twice plugins execution. - New Abort action - Depth Plugin Execution management on each action - Turn On/Off EasyPlugins feature (can be useful in some cases of imports) v1.2.0.0 Associate / Disassociate actions are now available Import / Export features Better management of Lookups Trigger NamingXmlObjectMapper: XOM Alpha 1.0: first Alpha Version of XOM: read and write xml nodes and attributes mapping to IList or Interfaces simple property attribute mapping creates new xml nodes at run time IList changes (add, remove) implemented in the next releasenopCommerce. Open source shopping cart (ASP.NET MVC): nopCommerce 3.10: Highlight features & improvements: • Performance optimization. • New more user-friendly product/product-variant logic. Now we'll have only products (simple and grouped). • Bundle products support added. • Allow a store owner to associate product image for product variant attribute values. To see the full list of fixes and changes please visit the release notes page (http://www.nopCommerce.com/releasenotes.aspx).ExtJS based ASP.NET Controls: FineUI v3.3.1: ??FineUI ?? ExtJS ??? ASP.NET ???。 FineUI??? ?? No JavaScript,No CSS,No UpdatePanel,No ViewState,No WebServices ???????。 ?????? IE 7.0、Firefox 3.6、Chrome 3.0、Opera 10.5、Safari 3.0+ ???? Apache License v2.0 ?:ExtJS ?? GPL v3 ?????(http://www.sencha.com/license)。 ???? ??:http://fineui.com/bbs/ ??:http://fineui.com/demo/ ??:http://fineui.com/doc/ ??:http://fineui.codeplex.com/ FineUI ???? ExtJS ????????,???? ExtJS ?,???????????ExtJS?: 1. ????? FineUI ? ExtJS ?:http://fineui.com/bbs/fo...AutoNLayered - Domain Oriented N-Layered .NET 4.5: AutoNLayered v1.0.5: - Fix Dtos. Abstract collections replaced by concrete (correct serialization WCF). - OrderBy in navigation properties. - Unit Test with Fakes. - Map of entities/dto moved to application services. - Libraries updated. Warning using Fakes: http://connect.microsoft.com/VisualStudio/feedback/details/782031/visual-studio-2012-add-fakes-assembly-does-not-add-all-needed-referencesPath Copy Copy: 11.1: Minor release with two new features: Submenu's contextual menu item now has an icon next to it Added reference to JavaScript regular expression format in Settings application Since this release does not have any glaring bug fixes, it is more of an optional update for existing users. It depends on whether you want to be able to spot the Path Copy Copy submenu more easily. I recommend you install it to see if the icon makes sense. As always, please don't hesitate to leave feedback via Discus...CMake Tools for Visual Studio: CMake Tools for Visual Studio 1.0 RC3: This is the third release candidate of CMake Tools for Visual Studio 1.0, which contains the following bug fixes: Opening a CMake file from Windows Explorer while Visual Studio is already open will no start a new instance of Visual Studio. Typing a symbol while the IntelliSense list box is visible and the text typed so far does not match any item in the list will dismiss the list box and insert the symbol typed.R.NET: R.NET 1.5: The major changes in v1.5 are: Initialize method must be called before using R. Settings should be passed to the method. EagerEvaluate method renamed to Evaluate (use Defer method when you want old version of Evaluate).Media Companion: Media Companion MC3.574b: Some good bug fixes been going on with the new XBMC-Link function. Thanks to all who were able to do testing and gave feedback. New:* Added some adhoc extra General movie filters, one of which is Plot = Outline (see fixes above). To see the filters, add the following line to your config.xml: <ShowExtraMovieFilters>True</ShowExtraMovieFilters>. The others are: Imdb in folder name, Imdb in not folder name & Imdb not in folder name & year mismatch. * Movie - display <tag> list on browser tab ...OfflineBrowser: Preview Release with Search: I've added search to this release.VG-Ripper & PG-Ripper: VG-Ripper 2.9.46: changes FIXED LoginMath.NET Numerics: Math.NET Numerics v2.6.0: What's New in Math.NET Numerics 2.6 - Announcement, Explanations and Sample Code. New: Linear Curve Fitting Linear least-squares fitting (regression) to lines, polynomials and linear combinations of arbitrary functions. Multi-dimensional fitting. Also works well in F# with the F# extensions. New: Root Finding Brent's method. ~Candy Chiu, Alexander Täschner Bisection method. ~Scott Stephens, Alexander Täschner Broyden's method, for multi-dimensional functions. ~Alexander Täschner ...AJAX Control Toolkit: July 2013 Release: AJAX Control Toolkit Release Notes - July 2013 Release Version 7.0725July 2013 release of the AJAX Control Toolkit. AJAX Control Toolkit .NET 4.5 – AJAX Control Toolkit for .NET 4.5 and sample site (Recommended). AJAX Control Toolkit .NET 4 – AJAX Control Toolkit for .NET 4 and sample site (Recommended). AJAX Control Toolkit .NET 3.5 – AJAX Control Toolkit for .NET 3.5 and sample site (Recommended). Notes: - Instructions for using the AJAX Control Toolkit with ASP.NET 4.5 can be found at...MJP's DirectX 11 Samples: Specular Antialiasing Sample: Sample code to complement my presentation that's part of the Physically Based Shading in Theory and Practice course at SIGGRAPH 2013, entitled "Crafting a Next-Gen Material Pipeline for The Order: 1886". Demonstrates various methods of preventing aliasing from specular BRDF's when using high-frequency normal maps. The zip file contains source code as well as a pre-compiled x64 binary.New ProjectsA Simple Java Encryption program: This is my very first Java project. It's a file encryptor. It's purpose is to codify a file to make it look like it contain random information.AA??: aa??,????。Browser Chooser 2: Browser Chooser 2 is an updated fork of the original. It's primary goal is to simplify using multiple browser.campuscloud-mobile: Dieses Projekt enthält die mobilen Apps zur Campuscloud App.campuscloud-owncloudplugins: Dieses Projekt enthält die ownCloud-Plugins zu den CampuCloud-Apps, die ownCloud-Server um weitere Funktionalitäten ergänzen.campuscloud-windowsphone8: Diese Projekt enthält die Windows Phone 8 App zur Campuscloud App.Chronos .Net Performance Profiler: Free .Net Performance ProfilerConvertServer: a pet project i works onDevelopment Tools for Solid Edge: Development tools for Solid Edge.DocAssist: The project is to facilitate user's day-to-day management of her files in a customisable way on Windows System equipped with .NET framework.epe: epeETP 3: prueba de asp mvc para desarrollar conjuntamente en un repositorio tfsHello World in MVC4: Application to testInvoke-MsTest PowerShell Module: A PowerShell module that makes unit testing Visual Studio projects fast and easy. Uses MsTest.exe to launch all test associated with your project or solution.MarathonTP: MarathonTP (TP stands for Transport Protocol) is a lightweight communication protocol specificaly developped for machine to machine communication.Multicopter Simulator: A simulation environment for multicopter built with Microsoft Robotics Developer Studio 4.MVC Rags: A bunch of ASP.NET MVC4 helpersMVVM.EventToCommand: Tommy.MVVM.EventToCommand is a free, open source developer focused event2command via WinRT for MVVM Pattern.RNT.Common: rnt.commonStorageOrizer: Software to manage disk space, e.g. move Programs without damaging function. Subset Sum Problem Solver: SubsetThirdPartyLogin: ?????twitterBootstrapAspNetMVCControls: Asp.net MVC Controls based on twiiter Boostrap( a beautiful html & css framework published by Twitter).Unify: Unify is an automatic IoC Container Configurator, based on layers approach via annotations.V32 Assembler: The assembler for my assembly language for my 32-bit virtual machine with a home-made instruction set.

    Read the article

1