Search Results

Search found 4071 results on 163 pages for 'preg split'.

Page 78/163 | < Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85  | Next Page >

  • Parse CSS out from <style> elements

    - by awj
    Can someone tell me an efficient method of retrieving the CSS between tags on a page of markup in .NET? I've come up with a method which uses recursion, Split() and CompareTo() but is really long-winded, and I feel sure that there must be a far shorter (and more clever) method of doing the same. Please keep in mind that it is possible to have more than one element on a page, and that the element can be either or .

    Read the article

  • Getting #Error value from ssrs reporting

    - by deepa
    I have created a dataset with fields "LastRunBuild" and "project" .The LastRunBuild field contain string of data seperated by commas according to each project. But Some Projects have no value in LastRunBuild field.When i am using this expression " iif(Fields!LastRunBuild.Value=nothing, nothing,Split(Fields!LastRunBuild.Value,",").GetValue(3)) " a #Error value returns every time. Please reply...

    Read the article

  • Are python list comprehensions always a good programming practice?

    - by dln385
    To make the question clear, I'll use a specific example. I have a list of college courses, and each course has a few fields (all of which are strings). The user gives me a string of search terms, and I return a list of courses that match all of the search terms. This can be done in a single list comprehension or a few nested for loops. Here's the implementation. First, the Course class: class Course: def __init__(self, date, title, instructor, ID, description, instructorDescription, *args): self.date = date self.title = title self.instructor = instructor self.ID = ID self.description = description self.instructorDescription = instructorDescription self.misc = args Every field is a string, except misc, which is a list of strings. Here's the search as a single list comprehension. courses is the list of courses, and query is the string of search terms, for example "history project". def searchCourses(courses, query): terms = query.lower().strip().split() return tuple(course for course in courses if all( term in course.date.lower() or term in course.title.lower() or term in course.instructor.lower() or term in course.ID.lower() or term in course.description.lower() or term in course.instructorDescription.lower() or any(term in item.lower() for item in course.misc) for term in terms)) You'll notice that a complex list comprehension is difficult to read. I implemented the same logic as nested for loops, and created this alternative: def searchCourses2(courses, query): terms = query.lower().strip().split() results = [] for course in courses: for term in terms: if (term in course.date.lower() or term in course.title.lower() or term in course.instructor.lower() or term in course.ID.lower() or term in course.description.lower() or term in course.instructorDescription.lower()): break for item in course.misc: if term in item.lower(): break else: continue break else: continue results.append(course) return tuple(results) That logic can be hard to follow too. I have verified that both methods return the correct results. Both methods are nearly equivalent in speed, except in some cases. I ran some tests with timeit, and found that the former is three times faster when the user searches for multiple uncommon terms, while the latter is three times faster when the user searches for multiple common terms. Still, this is not a big enough difference to make me worry. So my question is this: which is better? Are list comprehensions always the way to go, or should complicated statements be handled with nested for loops? Or is there a better solution altogether?

    Read the article

  • How to match a variable list of items separated by commas

    - by user261915
    I want to turn something like this CS 240, CS 246, ECE 222, ... (more or less); Software Engineering students only into ('CS 240', 'CS 246', 'ECE 222', 'ECE 220') in Python, code that matches a single course looks like >>> re.search('([A-Z]{2,5} \d{3})', 'SE 112').groups() ('SE 112',) I prefer a regular expression only method because I have a bunch of other alternate reg exps using '|' to combine them. However, a method with split is acceptable.

    Read the article

  • Getting a substring in Ruby by x number of chars

    - by wotaskd
    I'm trying to produce some Ruby code that will take a string and return a new one, with a number x number of characters removed from its end - these can be actual letters, numbers, spaces etc. Ex: given the following string a_string = "a1wer4zx" I need a simple way to get the same string, minus - say - the 3 last digits. In the case above, that would be "a1wer". The way I'm doing it right now seems very convoluted: an_array = a_string.split(//,(a_string.length-2)) an_array.pop new_string = an_array.join Any ideas?

    Read the article

  • Regex get multiple segments of a string in javascript

    - by dave
    I'm trying to extract some results from a download manager, the format is: [#8760e4 4.3MiB/40MiB(10%) CN:2 DL:4.9MiB ETA:7s] what I'd like to extract from the above example, would be an array that looks like this: ['4.3','MiB','40','MiB','10%','4.9','MiB','7','s'] I've tried to split this in various combinations, but nothing seems to be right. Would anyone happen to know how to do this or be able to offer suggestions? Thank you!

    Read the article

  • How can I avoid huge communication classes in WCF?

    - by mafutrct
    My understanding is that all contract-implementing code has to be in a single class, that can become very large, obviously. How do I avoid this? I really prefer to have a few small classes doing one part of the communication with clients than a single behemoth class. The only idea I could think of is using multiple interfaces implemented by a single class split up by partial, but I don't this this is really solving the issue.

    Read the article

  • How to "grep" out specific line ranges of a file

    - by Mike
    There are often times I'll grep -l whatev file to find what I'm looking for. Say the output is 1234: whatev 1 5555: whatev 2 6643: whatev 3 If I want to then just extract the lines between 1234 and 5555, is there a tool to do that? For static files I have a script that does wc -l of the file and then does the math to split it out with tail & head but that doesn't work out so well with log files that are constantly being written to.

    Read the article

  • How to store opening weekdays in a database

    - by JoaoPedro
    I have a group of checkboxes where the user selects some of the weekdays (the opening days of a store). How can I save the selected days? Should I save something like 0111111 (zero means closed on sunday) on the same field and split the result when reading the data? Or create a field for each day and store 0 or 1 on each (weird)?

    Read the article

  • How to match !x but not !!x in JS regex?

    - by mpeterson
    Given the following text: This is!!xa simple string!xpattern I would like to get a regexp that matches the !x that's between "string" and "pattern" but not !!xa that's between "is" and "a". This regexp is to be used inside a string split(). I have tried several combinations but I cannot get a regexp that meets my needs. Perhaps my expression is not so regular after all =) Thanks in advance!

    Read the article

  • vim - open file with complete path

    - by David Oneill
    If I have a file that contains a complete path for a file, is there a way to highlight the filename (using visual mode) and open the file? (preferably in a split screen) As I think about it, here is what I would like: if the file name contains a / character, assume it is a full path (IE the 'current directory' is root). Otherwise, use the current folder (IE default behavior) Is this possible?

    Read the article

  • How can I automatically refactor my classes to use the default namespace for the folder they're in?

    - by Daniel Schaffer
    I've been playing around with the structure of my project, and I'd like to reset the namespaces of my classes to what the default would be. That is, the default namespace for the project, plus each of the folders in the hierarchy. It's not as simple as just find + replace, since I've both added and renamed some folders, and files from some namespaces were split into multiple other namespaces. I'm using VS 2010.

    Read the article

  • Named pipe is using 100% CPU

    - by willwill
    I'm starting the script with ./file.py < pipe >> logfile and the script is: while True: try: I = raw_input().strip().split() except EOFError: continue doSomething() How could I better handle named pipe? This script always run at 100% CPU and it need to be real-time so I cannot use time.sleep.

    Read the article

  • Intelligent "Subtraction" of one text logfile from another

    - by Vi
    Example: Application generates large text log file A with many different messages. It generates similarly large log file B when does not function correctly. I want to see what messages in file B are essentially new, i.e. to filter-out everything from A. Trivial prototype is: Sort | uniq both files Join files sort | uniq -c grep -v "^2" This produces symmetric difference and inconvenient. How to do it better? (including non-symmetric difference and preserving of messages order in B) Program should first analyse A and learn which messages are common, then analyse B showing with messages needs attention. Ideally it should automatically disregard things like timestamps, line numbers or other volatile things. Example. A: 0:00:00.234 Received buffer 0x324234 0:00:00.237 Processeed buffer 0x324234 0:00:00.238 Send buffer 0x324255 0:00:03.334 Received buffer 0x324255 0:00:03.337 Processeed buffer 0x324255 0:00:03.339 Send buffer 0x324255 0:00:05.171 Received buffer 0x32421A 0:00:05.173 Processeed buffer 0x32421A 0:00:05.178 Send buffer 0x32421A B: 0:00:00.134 Received buffer 0x324111 0:00:00.137 Processeed buffer 0x324111 0:00:00.138 Send buffer 0x324111 0:00:03.334 Received buffer 0x324222 0:00:03.337 Processeed buffer 0x324222 0:00:03.338 Error processing buffer 0x324222 0:00:03.339 Send buffer 0x3242222 0:00:05.271 Received buffer 0x3242FA 0:00:05.273 Processeed buffer 0x3242FA 0:00:05.278 Send buffer 0x3242FA 0:00:07.280 Send buffer 0x3242FA failed Result: 0:00:03.338 Error processing buffer 0x324222 0:00:07.280 Send buffer 0x3242FA failed One of ways of solving it can be something like that: Split each line to logical units: 0:00:00.134 Received buffer 0x324111,0:00:00.134,Received,buffer,0x324111,324111,Received buffer, \d:\d\d:\d\d\.\d\d\d, \d+:\d+:\d+.\d+, 0x[0-9A-F]{6}, ... It should find individual words, simple patterns in numbers, common layouts (e.g. "some date than text than number than text than end_of_line"), also handle combinations of above. As it is not easy task, user assistance (adding regexes with explicit "disregard that","make the main factor","don't split to parts","consider as date/number","take care of order/quantity of such messages" rules) should be supported (but not required) for it. Find recurring units and "categorize" lines, filter out too volatile things like timestamps, addresses or line numbers. Analyse the second file, find things that has new logical units (one-time or recurring), or anything that will "amaze" the system which has got used to the first file. Example of doing some bit of this manually: $ cat A | head -n 1 0:00:00.234 Received buffer 0x324234 $ cat A | egrep -v "Received buffer" | head -n 1 0:00:00.237 Processeed buffer 0x324234 $ cat A | egrep -v "Received buffer|Processeed buffer" | head -n 1 0:00:00.238 Send buffer 0x324255 $ cat A | egrep -v "Received buffer|Processeed buffer|Send buffer" | head -n 1 $ cat B | egrep -v "Received buffer|Processeed buffer|Send buffer" 0:00:03.338 Error processing buffer 0x324222 0:00:07.280 Send buffer 0x3242FA failed This is a boring thing (there are a lot of message types); also I can accidentally include some too broad pattern. Also it can't handle complicated things like interrelation between messages. I know that it is AI-related. May be there are already developed tools?

    Read the article

  • GNU/Linux swapping blocks system

    - by Ole Tange
    I have used GNU/Linux on systems from 4 MB RAM to 512 GB RAM. When they start swapping, most of the time you can still log in and kill off the offending process - you just have to be 100-1000 times more patient. On my new 32 GB system that has changed: It blocks when it starts swapping. Sometimes with full disk activity but other times with no disk activity. To examine what might be the issue I have written this program. The idea is: 1 grab 3% of the memory free right now 2 if that caused swap to increase: stop 3 keep the chunk used for 30 seconds by forking off 4 goto 1 - #!/usr/bin/perl sub freekb { my $free = `free|grep buffers/cache`; my @a=split / +/,$free; return $a[3]; } sub swapkb { my $swap = `free|grep Swap:`; my @a=split / +/,$swap; return $a[2]; } my $swap = swapkb(); my $lastswap = $swap; my $free; while($lastswap >= $swap) { print "$swap $free"; $lastswap = $swap; $swap = swapkb(); $free = freekb(); my $used_mem = "x"x(1024 * $free * 0.03); if(not fork()) { sleep 30; exit(); } } print "Swap increased $swap $lastswap\n"; Running the program forever ought to keep the system at the limit of swapping, but only grabbing a minimal amount of swap and do that very slowly (i.e. a few MB at a time at most). If I run: forever free | stdbuf -o0 timestamp > freelog I ought to see swap slowly rising every second. (forever and timestamp from https://github.com/ole-tange/tangetools). But that is not the behaviour I see: I see swap increasing in jumps and that the system is completely blocked during these jumps. Here the system is blocked for 30 seconds with the swap usage increases with 1 GB: secs 169.527 Swap: 18440184 154184 18286000 170.531 Swap: 18440184 154184 18286000 200.630 Swap: 18440184 1134240 17305944 210.259 Swap: 18440184 1076228 17363956 Blocked: 21 secs. Swap increase 2400 MB: 307.773 Swap: 18440184 581324 17858860 308.799 Swap: 18440184 597676 17842508 330.103 Swap: 18440184 2503020 15937164 331.106 Swap: 18440184 2502936 15937248 Blocked: 20 secs. Swap increase 2200 MB: 751.283 Swap: 18440184 885288 17554896 752.286 Swap: 18440184 911676 17528508 772.331 Swap: 18440184 3193532 15246652 773.333 Swap: 18440184 1404540 17035644 Blocked: 37 secs. Swap increase 2400 MB: 904.068 Swap: 18440184 613108 17827076 905.072 Swap: 18440184 610368 17829816 942.424 Swap: 18440184 3014668 15425516 942.610 Swap: 18440184 2073580 16366604 This is bad enough, but what is even worse is that the system sometimes stops responding at all - even if I wait for hours. I have the feeling it is related to the swapping issue, but I cannot tell for sure. My first idea was to tweak /proc/sys/vm/swappiness from 60 to 0 or 100, just to see if that had any effect at all. 0 did not have an effect, but 100 did cause the problem to arise less often. How can I prevent the system from blocking for such a long time? Why does it decide to swapout 1-3 GB when less than 10 MB would suffice?

    Read the article

  • Kernel panic when bringing up DRBD resource

    - by sc.
    I'm trying to set up two machines synchonizing with DRBD. The storage is setup as follows: PV - LVM - DRBD - CLVM - GFS2. DRBD is set up in dual primary mode. The first server is set up and running fine in primary mode. The drives on the first server have data on them. I've set up the second server and I'm trying to bring up the DRBD resources. I created all the base LVM's to match the first server. After initializing the resources with `` drbdadm create-md storage I'm bringing up the resources by issuing drbdadm up storage After issuing that command, I get a kernel panic and the server reboots in 30 seconds. Here's a screen capture. My configuration is as follows: OS: CentOS 6 uname -a Linux host.structuralcomponents.net 2.6.32-279.5.2.el6.x86_64 #1 SMP Fri Aug 24 01:07:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux rpm -qa | grep drbd kmod-drbd84-8.4.1-2.el6.elrepo.x86_64 drbd84-utils-8.4.1-2.el6.elrepo.x86_64 cat /etc/drbd.d/global_common.conf global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb become-primary-on both; wfc-timeout 30; degr-wfc-timeout 10; outdated-wfc-timeout 10; } options { # cpu-mask on-no-data-accessible } disk { # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes # disk-drain md-flushes resync-rate resync-after al-extents # c-plan-ahead c-delay-target c-fill-target c-max-rate # c-min-rate disk-timeout } net { # protocol timeout max-epoch-size max-buffers unplug-watermark # connect-int ping-int sndbuf-size rcvbuf-size ko-count # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri # after-sb-1pri after-sb-2pri always-asbp rr-conflict # ping-timeout data-integrity-alg tcp-cork on-congestion # congestion-fill congestion-extents csums-alg verify-alg # use-rle protocol C; allow-two-primaries yes; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } } cat /etc/drbd.d/storage.res resource storage { device /dev/drbd0; meta-disk internal; on host.structuralcomponents.net { address 10.10.1.120:7788; disk /dev/vg_storage/lv_storage; } on host2.structuralcomponents.net { address 10.10.1.121:7788; disk /dev/vg_storage/lv_storage; } /var/log/messages is not logging anything about the crash. I've been trying to find a cause of this but I've come up with nothing. Can anyone help me out? Thanks.

    Read the article

  • Can spliting an access database cause printer and reporting issues?

    - by leeand00
    We have a setup in which our users log into an access database using MS Access 2003 over an RDP connection. The user's login to their own machines first using a roaming profile. They then click an rdp connection file on the desktop and login to the remote server, via RDP, where they use MS Access as the shell; they don't have any access to any of explorer.exe features such as the start menu. The database they are logging into is more of an application, and provides functionality for entering data, querying data, and running reports via form based menus. It all worked pretty well until we split the database as it was nearing 2GBs in size. We moved out the payroll data into a separate partition, a database with the same name in a different folder, both of them on the server. Only two tables were moved into this new database partition, and they were re-linked as external tables in the new partition. Now while everything appears to be working fine data-wise after the split, there's a new issue when our users login via RDP and attempt to run reports: often the report will not display and instead the user sees an error about the click event of the form. At first I didn't even know it was printer-related, as we didn't really change anything related to the printers as far as I knew. Confused about the error, I talked to the guy who previously worked here and who was in charge of splitting the database, and he told me to tell the users to set their default printers (on their local machines, not on the server) to the "printer" Microsoft XPS Document Writer which isn't a physical printer at all. This allowed the user's to display their reports, but if they want to print out reports, they are required to go to the File menu and select Print, clicking the print icon on the toolbar takes them to a Save As... dialog as would be expected when using the Microsoft XPS Document Writer as your default printer. It's easy to tell if the user is having a problem because a quick mouseover of the printer icon will yield a tooltip of (none) when they cannot access their reports, and a tooltip of Microsoft XPS Document Writer when they can view the reports. If the user's printer is set to anything other than Microsoft XPS Document Writer as the default on their local machine, then (none) is always displayed when they rdp to the database. The RDP settings are setup to transfer the local printer to the server. Telling the users to do this to print has been more of a band-aid on the whole situation until we find a better solution and an explanation as to why splitting a database would prevent users from printing or even viewing access database reports. Which is why I'm here asking this question. Also of note all the printers on the network now show up on the server so that when the users do click File->Print to print their reports on a physical printer, they have to look through a huge list of printers to find theirs in the dropdown. So the little band-aid fix we have is not ideal. Previously, only the printers on the user's local machine displayed here, and not all the printers on the network. My co-worker seems to think this has something to do with permissions, I personally think it has to do with roaming profiles, and Group Policies which is what I've been reading up on. I really don't know how to fix this or how it is related to splitting the database.

    Read the article

  • Parallelism in .NET – Part 5, Partitioning of Work

    - by Reed
    When parallelizing any routine, we start by decomposing the problem.  Once the problem is understood, we need to break our work into separate tasks, so each task can be run on a different processing element.  This process is called partitioning. Partitioning our tasks is a challenging feat.  There are opposing forces at work here: too many partitions adds overhead, too few partitions leaves processors idle.  Trying to work the perfect balance between the two extremes is the goal for which we should aim.  Luckily, the Task Parallel Library automatically handles much of this process.  However, there are situations where the default partitioning may not be appropriate, and knowledge of our routines may allow us to guide the framework to making better decisions. First off, I’d like to say that this is a more advanced topic.  It is perfectly acceptable to use the parallel constructs in the framework without considering the partitioning taking place.  The default behavior in the Task Parallel Library is very well-behaved, even for unusual work loads, and should rarely be adjusted.  I have found few situations where the default partitioning behavior in the TPL is not as good or better than my own hand-written partitioning routines, and recommend using the defaults unless there is a strong, measured, and profiled reason to avoid using them.  However, understanding partitioning, and how the TPL partitions your data, helps in understanding the proper usage of the TPL. I indirectly mentioned partitioning while discussing aggregation.  Typically, our systems will have a limited number of Processing Elements (PE), which is the terminology used for hardware capable of processing a stream of instructions.  For example, in a standard Intel i7 system, there are four processor cores, each of which has two potential hardware threads due to Hyperthreading.  This gives us a total of 8 PEs – theoretically, we can have up to eight operations occurring concurrently within our system. In order to fully exploit this power, we need to partition our work into Tasks.  A task is a simple set of instructions that can be run on a PE.  Ideally, we want to have at least one task per PE in the system, since fewer tasks means that some of our processing power will be sitting idle.  A naive implementation would be to just take our data, and partition it with one element in our collection being treated as one task.  When we loop through our collection in parallel, using this approach, we’d just process one item at a time, then reuse that thread to process the next, etc.  There’s a flaw in this approach, however.  It will tend to be slower than necessary, often slower than processing the data serially. The problem is that there is overhead associated with each task.  When we take a simple foreach loop body and implement it using the TPL, we add overhead.  First, we change the body from a simple statement to a delegate, which must be invoked.  In order to invoke the delegate on a separate thread, the delegate gets added to the ThreadPool’s current work queue, and the ThreadPool must pull this off the queue, assign it to a free thread, then execute it.  If our collection had one million elements, the overhead of trying to spawn one million tasks would destroy our performance. The answer, here, is to partition our collection into groups, and have each group of elements treated as a single task.  By adding a partitioning step, we can break our total work into small enough tasks to keep our processors busy, but large enough tasks to avoid overburdening the ThreadPool.  There are two clear, opposing goals here: Always try to keep each processor working, but also try to keep the individual partitions as large as possible. When using Parallel.For, the partitioning is always handled automatically.  At first, partitioning here seems simple.  A naive implementation would merely split the total element count up by the number of PEs in the system, and assign a chunk of data to each processor.  Many hand-written partitioning schemes work in this exactly manner.  This perfectly balanced, static partitioning scheme works very well if the amount of work is constant for each element.  However, this is rarely the case.  Often, the length of time required to process an element grows as we progress through the collection, especially if we’re doing numerical computations.  In this case, the first PEs will finish early, and sit idle waiting on the last chunks to finish.  Sometimes, work can decrease as we progress, since previous computations may be used to speed up later computations.  In this situation, the first chunks will be working far longer than the last chunks.  In order to balance the workload, many implementations create many small chunks, and reuse threads.  This adds overhead, but does provide better load balancing, which in turn improves performance. The Task Parallel Library handles this more elaborately.  Chunks are determined at runtime, and start small.  They grow slowly over time, getting larger and larger.  This tends to lead to a near optimum load balancing, even in odd cases such as increasing or decreasing workloads.  Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable<T>, the number of items required for processing is not known in advance, and must be discovered at runtime.  In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it.  Since IEnumerable<T> is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out.  By default, it uses a partitioning method similar to the one described above.  We can see this directly by looking at the Visual Partitioning sample shipped by the Task Parallel Library team, and available as part of the Samples for Parallel Programming.  When we run the sample, with four cores and the default, Load Balancing partitioning scheme, we see this: The colored bands represent each processing core.  You can see that, when we started (at the top), we begin with very small bands of color.  As the routine progresses through the Parallel.ForEach, the chunks get larger and larger (seen by larger and larger stripes). Most of the time, this is fantastic behavior, and most likely will out perform any custom written partitioning.  However, if your routine is not scaling well, it may be due to a failure in the default partitioning to handle your specific case.  With prior knowledge about your work, it may be possible to partition data more meaningfully than the default Partitioner. There is the option to use an overload of Parallel.ForEach which takes a Partitioner<T> instance.  The Partitioner<T> class is an abstract class which allows for both static and dynamic partitioning.  By overriding Partitioner<T>.SupportsDynamicPartitions, you can specify whether a dynamic approach is available.  If not, your custom Partitioner<T> subclass would override GetPartitions(int), which returns a list of IEnumerator<T> instances.  These are then used by the Parallel class to split work up amongst processors.  When dynamic partitioning is available, GetDynamicPartitions() is used, which returns an IEnumerable<T> for each partition.  If you do decide to implement your own Partitioner<T>, keep in mind the goals and tradeoffs of different partitioning strategies, and design appropriately. The Samples for Parallel Programming project includes a ChunkPartitioner class in the ParallelExtensionsExtras project.  This provides example code for implementing your own, custom allocation strategies, including a static allocator of a given chunk size.  Although implementing your own Partitioner<T> is possible, as I mentioned above, this is rarely required or useful in practice.  The default behavior of the TPL is very good, often better than any hand written partitioning strategy.

    Read the article

  • SSIS Lookup component tuning tips

    - by jamiet
    Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final  thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • The new workflow management of Oracle´s Hyperion Planning: Define more details with Planning Unit Hierarchies and Promotional Paths

    - by Alexandra Georgescu
    After having been almost unchanged for several years, starting with the 11.1.2 release of Oracle´s Hyperion Planning the Process Management has not only got a new name: “Approvals” now is offering the possibility to further split Planning Units (comprised of a unique Scenario-Version-Entity combination) into more detailed combinations along additional secondary dimensions, a so called Planning Unit Hierarchy, and also to pre-define a path of planners, reviewers and approvers, called Promotional Path. I´d like to introduce you to changes and enhancements in this new process management and arouse your curiosity for checking out more details on it. One reason of using the former process management in Planning was to limit data entry rights to one person at a time based on the assignment of a planning unit. So the lowest level of granularity for this assignment was, for a given Scenario-Version combination, the individual entity. Even if in many cases one person wasn´t responsible for all data being entered into that entity, but for only part of it, it was not possible to split the ownership along another additional dimension, for example by assigning ownership to different accounts at the same time. By defining a so called Planning Unit Hierarchy (PUH) in Approvals this gap is now closed. Complementing new Shared Services roles for Planning have been created in order to manage set up and use of Approvals: The Approvals Administrator consisting of the following roles: Approvals Ownership Assigner, who assigns owners and reviewers to planning units for which Write access is assigned (including Planner responsibilities). Approvals Supervisor, who stops and starts planning units and takes any action on planning units for which Write access is assigned. Approvals Process Designer, who can modify planning unit hierarchy secondary dimensions and entity members for which Write access is assigned, can also modify scenarios and versions that are assigned to planning unit hierarchies and can edit validation rules on data forms for which access is assigned. (this includes as well Planner and Ownership Assigner responsibilities) Set up of a Planning Unit Hierarchy is done under the Administration menu, by selecting Approvals, then Planning Unit Hierarchy. Here you create new PUH´s or edit existing ones. The following window displays: After providing a name and an optional description, a pre-selection of entities can be made for which the PUH will be defined. Available options are: All, which pre-selects all entities to be included for the definitions on the subsequent tabs None, manual entity selections will be made subsequently Custom, which offers the selection for an ancestor and the relative generations, that should be included for further definitions. Finally a pattern needs to be selected, which will determine the general flow of ownership: Free-form, uses the flow/assignment of ownerships according to Planning releases prior to 11.1.2 In Bottom-up, data input is done at the leaf member level. Ownership follows the hierarchy of approval along the entity dimension, including refinements using a secondary dimension in the PUH, amended by defined additional reviewers in the promotional path. Distributed, uses data input at the leaf level, while ownership starts at the top level and then is distributed down the organizational hierarchy (entities). After ownership reaches the lower levels, budgets are submitted back to the top through the approval process. Proceeding to the next step, now a secondary dimension and the respective members from that dimension might be selected, in order to create more detailed combinations underneath each entity. After selecting the Dimension and a Parent Member, the definition of a Relative Generation below this member assists in populating the field for Selected Members, while the Count column shows the number of selected members. For refining this list, you might click on the icon right beside the selected member field and use the check-boxes in the appearing list for deselecting members. -------------------------------------------------------------------------------------------------------- TIP: In order to reduce maintenance of the PUH due to changes in the dimensions included (members added, moved or removed) you should consider to dynamically link those dimensions in the PUH with the dimension hierarchies in the planning application. For secondary dimensions this is done using the check-boxes in the Auto Include column. For the primary dimension, the respective selection criteria is applied by right-clicking the name of an entity activated as planning unit, then selecting an item of the shown list of include or exclude options (children, descendants, etc.). Anyway in order to apply dimension changes impacting the PUH a synchronization must be run. If this is really necessary or not is shown on the first screen after selecting from the menu Administration, then Approvals, then Planning Unit Hierarchy: under Synchronized you find the statuses Yes, No or Locked, where the last one indicates, that another user is just changing or synchronizing the PUH. Select one of the not synchronized PUH´s (status No) and click the Synchronize option in order to execute. -------------------------------------------------------------------------------------------------------- In the next step owners and reviewers are assigned to the PUH. Using the icons with the magnifying glass right besides the columns for Owner and Reviewer the respective assignments can be made in the ordermthat you want them to review the planning unit. While it is possible to assign only one owner per entity or combination of entity+ member of the secondary dimension, the selection for reviewers might consist of more than one person. The complete Promotional Path, including the defined owners and reviewers for the entity parents, can be shown by clicking the icon. In addition optional users might be defined for being notified about promotions for a planning unit. -------------------------------------------------------------------------------------------------------- TIP: Reviewers cannot change data, but can only review data according to their data access permissions and reject or promote planning units. -------------------------------------------------------------------------------------------------------- In order to complete your PUH definitions click Finish - this saves the PUH and closes the window. As a final step, before starting the approvals process, you need to assign the PUH to the Scenario-Version combination for which it should be used. From the Administration menu select Approvals, then Scenario and Version Assignment. Expand the PUH in order to see already existing assignments. Under Actions click the add icon and select scenarios and versions to be assigned. If needed, click the remove icon in order to delete entries. After these steps, set up is completed for starting the approvals process. Start, stop and control of the approvals process is now done under the Tools menu, and then Manage Approvals. The new PUH feature is complemented by various additional settings and features; some of them at least should be mentioned here: Export/Import of PUHs: Out of Office agent: Validation Rules changing promotional/approval path if violated (including the use of User-defined Attributes (UDAs)): And various new and helpful reviewer actions with corresponding approval states. About the Author: Bernhard Kinkel started working for Hyperion Solutions as a Presales Consultant and Consultant in 1998 and moved to Hyperion Education Services in 1999. He joined Oracle University in 2007 where he is a Principal Education Consultant. Based on these many years of working with Hyperion products he has detailed product knowledge across several versions. He delivers both classroom and live virtual courses. His areas of expertise are Oracle/Hyperion Essbase, Oracle Hyperion Planning and Hyperion Web Analysis.

    Read the article

  • Validating a linked item&rsquo;s data template in Sitecore

    - by Kyle Burns
    I’ve been doing quite a bit of work in Sitecore recently and last week I encountered a situation that it appears many others have hit.  I was working with a field that had been configured originally as a grouped droplink, but now needed to be updated to support additional levels of hierarchy in the folder structure.  If you’ve done any work in Sitecore that statement makes sense, but if not it may seem a bit cryptic.  Sitecore offers a number of different field types and a subset of these field types focus on providing links either to other items on the content tree or to content that is not stored in Sitecore.  In the case of the grouped droplink, the field is configured with a “root” folder and each direct descendant of this folder is considered to be a header for a grouping of other items and displayed in a dropdown.  A picture is worth a thousand words, so consider the following piece of a content tree: If I configure a grouped droplink field to use the “Current” folder as its datasource, the control that gets to my content author looks like this: This presents a nicely organized display and limits the user to selecting only the direct grandchildren of the folder root.  It also presents the limitation that struck as we were thinking through the content architecture and how it would hold up over time – the authors cannot further organize content under the root folder because of the structure required for the dropdown to work.  Over time, not allowing the hierarchy to go any deeper would prevent out authors from being able to organize their content in a way that it would be found when needed, so the grouped droplink data type was not going to fit the bill. I needed to look for an alternative data type that allowed for selection of a single item and limited my choices to descendants of a specific node on the content tree.  After looking at the options available for links in Sitecore and considering them against each other, one option stood out as nearly perfect – the droptree.  This field type stores its data identically to the droplink and allows for the selection of zero or one items under a specific node in the content tree.  By changing my data template to use droptree instead of grouped droplink, the author is now presented with the following when selecting a linked item: Sounds great, but a did say almost perfect – there’s still one flaw.  The code intended to display the linked item is expecting the selection to use a specific data template (or more precisely it makes certain assumptions about the fields that will be present), but the droptree does nothing to prevent the author from selecting a folder (since folders are items too) instead of one of the items contained within a folder.  I looked to see if anyone had already solved this problem.  I found many people discussing the problem, but the closest that I found to a solution was the statement “the best thing would probably be to create a custom validator” with no further discussion in regards to what this validator might look like.  I needed to create my own validator to ensure that the user had not selected a folder.  Since so many people had the same issue, I decided to make the validator as reusable as possible and share it here. The validator that I created inherits from StandardValidator.  In order to make the validator more intuitive to developers that are familiar with the TreeList controls in Sitecore, I chose to implement the following parameters: ExcludeTemplatesForSelection – serves as a “deny list”.  If the data template of the selected item is in this list it will not validate IncludeTemplatesForSelection – this can either be empty to indicate that any template not contained in the exclusion list is acceptable or it can contain the list of acceptable templates Now that I’ve explained the parameters and the purpose of the validator, I’ll let the code do the rest of the talking: 1: /// <summary> 2: /// Validates that a link field value meets template requirements 3: /// specified using the following parameters: 4: /// - ExcludeTemplatesForSelection: If present, the item being 5: /// based on an excluded template will cause validation to fail. 6: /// - IncludeTemplatesForSelection: If present, the item not being 7: /// based on an included template will cause validation to fail 8: /// 9: /// ExcludeTemplatesForSelection trumps IncludeTemplatesForSelection 10: /// if the same value appears in both lists. Lists are comma seperated 11: /// </summary> 12: [Serializable] 13: public class LinkItemTemplateValidator : StandardValidator 14: { 15: public LinkItemTemplateValidator() 16: { 17: } 18:   19: /// <summary> 20: /// Serialization constructor is required by the runtime 21: /// </summary> 22: /// <param name="info"></param> 23: /// <param name="context"></param> 24: public LinkItemTemplateValidator(SerializationInfo info, StreamingContext context) : base(info, context) { } 25:   26: /// <summary> 27: /// Returns whether the linked item meets the template 28: /// constraints specified in the parameters 29: /// </summary> 30: /// <returns> 31: /// The result of the evaluation. 32: /// </returns> 33: protected override ValidatorResult Evaluate() 34: { 35: if (string.IsNullOrWhiteSpace(ControlValidationValue)) 36: { 37: return ValidatorResult.Valid; // let "required" validation handle 38: } 39:   40: var excludeString = Parameters["ExcludeTemplatesForSelection"]; 41: var includeString = Parameters["IncludeTemplatesForSelection"]; 42: if (string.IsNullOrWhiteSpace(excludeString) && string.IsNullOrWhiteSpace(includeString)) 43: { 44: return ValidatorResult.Valid; // "allow anything" if no params 45: } 46:   47: Guid linkedItemGuid; 48: if (!Guid.TryParse(ControlValidationValue, out linkedItemGuid)) 49: { 50: return ValidatorResult.Valid; // probably put validator on wrong field 51: } 52:   53: var item = GetItem(); 54: var linkedItem = item.Database.GetItem(new ID(linkedItemGuid)); 55:   56: if (linkedItem == null) 57: { 58: return ValidatorResult.Valid; // this validator isn't for broken links 59: } 60:   61: var exclusionList = (excludeString ?? string.Empty).Split(','); 62: var inclusionList = (includeString ?? string.Empty).Split(','); 63:   64: if ((inclusionList.Length == 0 || inclusionList.Contains(linkedItem.TemplateName)) 65: && !exclusionList.Contains(linkedItem.TemplateName)) 66: { 67: return ValidatorResult.Valid; 68: } 69:   70: Text = GetText("The field \"{0}\" specifies an item which is based on template \"{1}\". This template is not valid for selection", GetFieldDisplayName(), linkedItem.TemplateName); 71:   72: return GetFailedResult(ValidatorResult.FatalError); 73: } 74:   75: protected override ValidatorResult GetMaxValidatorResult() 76: { 77: return ValidatorResult.FatalError; 78: } 79:   80: public override string Name 81: { 82: get { return @"LinkItemTemplateValidator"; } 83: } 84: }   In this blog entry, I have shared some code that I found useful in solving a problem that seemed fairly common.  Hopefully the next person that is looking for this answer finds it useful as well.

    Read the article

  • MySQL – Scalability on Amazon RDS: Scale out to multiple RDS instances

    - by Pinal Dave
    Today, I’d like to discuss getting better MySQL scalability on Amazon RDS. The question of the day: “What can you do when a MySQL database needs to scale write-intensive workloads beyond the capabilities of the largest available machine on Amazon RDS?” Let’s take a look. In a typical EC2/RDS set-up, users connect to app servers from their mobile devices and tablets, computers, browsers, etc.  Then app servers connect to an RDS instance (web/cloud services) and in some cases they might leverage some read-only replicas.   Figure 1. A typical RDS instance is a single-instance database, with read replicas.  This is not very good at handling high write-based throughput. As your application becomes more popular you can expect an increasing number of users, more transactions, and more accumulated data.  User interactions can become more challenging as the application adds more sophisticated capabilities. The result of all this positive activity: your MySQL database will inevitably begin to experience scalability pressures. What can you do? Broadly speaking, there are four options available to improve MySQL scalability on RDS. 1. Larger RDS Instances – If you’re not already using the maximum available RDS instance, you can always scale up – to larger hardware.  Bigger CPUs, more compute power, more memory et cetera. But the largest available RDS instance is still limited.  And they get expensive. “High-Memory Quadruple Extra Large DB Instance”: 68 GB of memory 26 ECUs (8 virtual cores with 3.25 ECUs each) 64-bit platform High I/O Capacity Provisioned IOPS Optimized: 1000Mbps 2. Provisioned IOPs – You can get provisioned IOPs and higher throughput on the I/O level. However, there is a hard limit with a maximum instance size and maximum number of provisioned IOPs you can buy from Amazon and you simply cannot scale beyond these hardware specifications. 3. Leverage Read Replicas – If your application permits, you can leverage read replicas to offload some reads from the master databases. But there are a limited number of replicas you can utilize and Amazon generally requires some modifications to your existing application. And read-replicas don’t help with write-intensive applications. 4. Multiple Database Instances – Amazon offers a fourth option: “You can implement partitioning,thereby spreading your data across multiple database Instances” (Link) However, Amazon does not offer any guidance or facilities to help you with this. “Multiple database instances” is not an RDS feature.  And Amazon doesn’t explain how to implement this idea. In fact, when asked, this is the response on an Amazon forum: Q: Is there any documents that describe the partition DB across multiple RDS? I need to use DB with more 1TB but exist a limitation during the create process, but I read in the any FAQ that you need to partition database, but I don’t find any documents that describe it. A: “DB partitioning/sharding is not an official feature of Amazon RDS or MySQL, but a technique to scale out database by using multiple database instances. The appropriate way to split data depends on the characteristics of the application or data set. Therefore, there is no concrete and specific guidance.” So now what? The answer is to scale out with ScaleBase. Amazon RDS with ScaleBase: What you get – MySQL Scalability! ScaleBase is specifically designed to scale out a single MySQL RDS instance into multiple MySQL instances. Critically, this is accomplished with no changes to your application code.  Your application continues to “see” one database.   ScaleBase does all the work of managing and enforcing an optimized data distribution policy to create multiple MySQL instances. With ScaleBase, data distribution, transactions, concurrency control, and two-phase commit are all 100% transparent and 100% ACID-compliant, so applications, services and tooling continue to interact with your distributed RDS as if it were a single MySQL instance. The result: now you can cost-effectively leverage multiple MySQL RDS instance to scale out write-intensive workloads to an unlimited number of users, transactions, and data. Amazon RDS with ScaleBase: What you keep – Everything! And how does this change your Amazon environment? 1. Keep your application, unchanged – There is no change your application development life-cycle at all.  You still use your existing development tools, frameworks and libraries.  Application quality assurance and testing cycles stay the same. And, critically, you stay with an ACID-compliant MySQL environment. 2. Keep your RDS value-added services – The value-added services that you rely on are all still available. Amazon will continue to handle database maintenance and updates for you. You can still leverage High Availability via Multi A-Z.  And, if it benefits youra application throughput, you can still use read replicas. 3. Keep your RDS administration – Finally the RDS monitoring and provisioning tools you rely on still work as they did before. With your one large MySQL instance, now split into multiple instances, you can actually use less expensive, smallersmaller available RDS hardware and continue to see better database performance. Conclusion Amazon RDS is a tremendous service, but it doesn’t offer solutions to scale beyond a single MySQL instance. Larger RDS instances get more expensive.  And when you max-out on the available hardware, you’re stuck.  Amazon recommends scaling out your single instance into multiple instances for transaction-intensive apps, but offers no services or guidance to help you. This is where ScaleBase comes in to save the day. It gives you a simple and effective way to create multiple MySQL RDS instances, while removing all the complexities typically caused by “DIY” sharding andwith no changes to your applications . With ScaleBase you continue to leverage the AWS/RDS ecosystem: commodity hardware and value added services like read replicas, multi A-Z, maintenance/updates and administration with monitoring tools and provisioning. SCALEBASE ON AMAZON If you’re curious to try ScaleBase on Amazon, it can be found here – Download NOW. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: MySQL, PostADay, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

< Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85  | Next Page >