Search Results

Search found 23 results on 1 pages for 'ifilter'.

Page 1/1 | 1 

  • How to use a specific PDF IFilter

    - by dthrasher
    I'm trying to extract text from PDF files using an iFilter. The Adobe PDF iFilter that is distributed with Adobe Reader is awful, returning HRESULT E_FAIL messages for many PDF documents. The FoxIt PDF IFilter works beautifully on virtually all of the PDFs I've been using for testing. The problem is that every time the Adobe Updater runs, it replaces the awesome FoxIt IFilter with the crappy Adobe IFilter. I've been using the LoadIFilter method to get the registered IFilter for PDF files. Is there a way to force the Win32 API to load the FoxIt IFilter instead of the Adobe IFilter? NOTE: This question about determining which IFilters are installed asks a related -- but not identical -- question.

    Read the article

  • Debugging iFilter plug-in (PDF indexing)

    - by Trevor Sullivan
    I have the official Adobe x64 iFilter PDF plug-in and the FoxIt Software iFilter PDF plug-in installed, and neither one seems to be allowing me to index the contents of PDF files. So far, I've: Added my data folder into the Indexing service configuration Ensured that PDF files are configured to index "file properties and contents" Rebuilt the index from scratch But, when I search, I can only search for PDF file names, not the contents of them. Any ideas on how to debug this issue?

    Read the article

  • Does an IFilter Exist for Indexing Source Code Files?

    - by AMissico
    Anybody know of an IFilter that can index source code files beyond what the "Plain Text" filter can provide, with possibly a custom "Property Set" specific to programming? For example, I have 835MB in 41,000 files and 8,200 folders in my "Code Library" folder. I would like to perform searches such as "select distinct attributes on properties" or "select class exceptions" or "select classes with nested private classes". Preferrably, the IFilter can distinguish between various languages, so I can perform a query like "select class exceptions in VB.NET" or "select 'resume next' in VBScript". Other Examples "select all enum from folder('microsoft source code') in namespace 'system.io'"

    Read the article

  • How to best deal with photos passed to IFilter?

    - by sharptooth
    I'm implementing an IFilter for indexing image formats. One problem is photos - many users have tons of photos, photos are huge and loading and searching for text on them is time consuming. Yes, sometimes people use cameras instead of scanners for digitizing documents, but the potential problems IMO far outweight the possibility of encountering a document digitized with a photo camera. So my implementation will not extract text from photos at all. What should the IFilter do once it detects that a given file is a photo image - indicate an error or return empty text?

    Read the article

  • Do I assign different or the same class id to 32-bit and 64-bit versions of the same IFilter?

    - by sharptooth
    I've implemented my own Microsoft Search IFilter. I need two versions of it - 32-bit and 64-bit for deploying them on corresponding systems. In case of IFilters for any file extension I can only register one IFilter class id. Which means I can only use one version on any system. So having two class ids seems useless - it only makes the automatic installer more complex. Do I reuse the same COM class id for both or do I use different class ids?

    Read the article

  • ifilter not working with MOSS 2007, cant crawl .pdf

    - by SORRYPROFESSEROFYEARNING
    Installed ifilter and followed the guides: http://msmvps.com/blogs/sundar_narasiman/archive/2008/02/06/configuring-moss-2007-to-search-pdf-documents-install-and-configure-pdf-ifilters.aspx and the accompanying link to the MS hotfix.. I have initiated multiple crawls that don't show any .pdf documents, let alone the contents of the .pdfs (I did constantly upload test documents with real content). In the 'file types' menu of the shared servies, it didn't show the pdf icon as I think it was meant to, it also lists 'pdf' as filetype 'AcroExch.Document', is this correct? Any ideas anyone?

    Read the article

  • Can I manually map a file extension to an IFilter?

    - by Deane
    I'm working with Microsoft Indexing Service. I have purchased a third-party IFilter to extract XMP metadata from Adobe products. I'm having trouble getting it to work, and it occurs to me that the problem is that I don't actually have the Adobe software installed on my server, so the IFilters are not mapped. Put another way, there's nothing to tell the indexer that ".psd" files should use this DLL rather than the default DLL. Is it possible to manually map file extensions to the IFilter you want to use?

    Read the article

  • Full-Text Search in SQL Server Express Won't Recognize Latest IFilters

    - by Brandon King
    I'm having difficulty getting full-text search working in SQL Server 2008 Express with Advanced Services. I have a table loaded with .DOCX files as varbinary(MAX) data that I want to use for a full-text catalog, but it doesn't seem to recognize the .DOCX format. Here are the steps that I've taken... Installed the latest Filter Pack 2.0 Exec sp_fulltext_service 'load_os_resources', 1 Exec sys.sp_help_fulltext_system_components 'all' (NOTE: .DOCX is not shown as a filter) Building the full-text catalog fails to identify any key words I initially thought there might be a conflict between x86 SQL Express and x64 Filter Pack on my Windows 7 machine, but I just tried it with everything x86 in a Windows XP virtual machine and got the same result.

    Read the article

  • Kooboo CMS 2.1.1.0 released

    New features Add new API RssUrl to generate RSS link, this is an extension to UrlHelper. Add possibility to index and search attachment content on Lucene full text search engine, some of the attachment requires ifilter component from Microsoft. Supported file attachments include: .docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb, .zip, .one, .vdx, .vsd, .vss, .vst, .vdx, .vsx, and .vtx.Please download and install ifilter from: http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=enFor...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • Settings variable values in a Moq Callback() call

    - by Adam Driscoll
    I think I may be a bit confused on the syntax of the Moq Callback methods. When I try to do something like this: IFilter filter = new Filter(); List<IFoo> objects = new List<IFoo> { new Foo(), new Foo() }; IQueryable myFilteredFoos = null; mockObject.Setup(m => m.GetByFilter(It.IsAny<IFilter>())).Callback( (IFilter filter) => myFilteredFoos = filter.FilterCollection(objects)).Returns(myFilteredFoos.Cast<IFooBar>()); This throws a exception because myFilteredFoos is null during the Cast<IFooBar>() call. Is this not working as I expect? I would think FilterCollection would be called and then myFilteredFoos would be non-null and allow for the cast. FilterCollection is not capable of returning a null which draws me to the conclusion it is not being called. Also, when I declare myFilteredFoos like this: Queryable myFilteredFoos; The Return call complains that myFilteredFoos may be used before it is initialized.

    Read the article

  • Configuring Full-Text Search for pdf and docx files

    - by Lukasz Kurylo
    I think in may I was creating a little filters module based on Full Text-Search. I have configured my dev machine, the same for two testing servers – in our company for internal testing before we deployed it to client, and then on the testing client server. Until last week this build  was still on the testing server and finally we got feedback that we can deploy it on the production one. I only say that, I lost half a day because I had not correctly remembered what I was doing to configure the FTS on the previous servers and I had no notes for that. I foolishly believed in my memory. Lesson learned.   For future reference a bunch of steps to configure the FTS for searching in *.pdf and *.docx files (and by the way in other Office files like *.xlsx).   1. From the page (link) download and install the *.pdf IFilter for FTS. 2. To the PATH global system variable add path to the catalog, where you installed the plugin. Default for this version is: C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin 3. From the page (link) download a FilterPackx64.exe and install it. 4. Now from SSMS execute the following procedures: -sp_fulltext_service 'load_os_resources',1 -sp_fulltext_service 'verify_signature', 0 5. Restart the server 6. Now we must check if the plugins are visible: -select document_type, path from sys.fulltext_document_types where document_type = '.pdf' -select document_type, path from sys.fulltext_document_types where document_type = '.docx' 7. If we see a result, then we can assume that everything is ok*. 8. Right now we can create a catalog for FTS and indexes on appropriate columns.     *I lost a lot of hours to find out, why the plugin for the *.pdf files wasn’t indexed any file in the database, but in the sys.fulltext_document_types table there was available a line for this plugin. After the deeper investigation I found that the *.pdf files actually were indexed. At least the EOF sign was added to the indexes and nothing more for each file. In the end the problem was that, I forgot to add the /bin in the path to the plugin in PATH variable..

    Read the article

  • C#: How to avoid WIA-error when scanning documents with 2400dpi or more?

    - by Stephan_W
    Hello, when we scan a document with a resolution of 2400dpi or higher, we recieve (for example) the following error-message: COMException: Ausnahme von HRESULT: 0x80010100 (RPC_E_SYS_CALL_FAILED) or COMException: Ausnahme von HRESULT: 0x8021006F in one of the following lines img = itm.Transfer(scanFormat.ScanFormat) as WIA.ImageFile; img = ip.Apply(img as WIA.ImageFile); some screenshots for the mentioned errors: http://www.amarant-it.de/TempDownload/WIA_Error01.png or the same path with WIA_Error02.png and WIA_Error03.png for scanning we use the following code: #region Image-Convert-Settings //IP.Filters.Add IP.FilterInfos("Convert").FilterID //IP.Filters(1).Properties("FormatID").Value = wiaFormatJPEG WIA.IImageProcess ip = new WIA.ImageProcessClass(); object convert = "Convert"; WIA.IFilterInfo fi = ip.FilterInfos.get_Item(ref convert); ip.Filters.Add(fi.FilterID, 0); convert = "FormatID"; object formatstring = scanFormat.ScanFormat; WIA.IFilter filter; foreach (WIA.IFilter fTemp in ip.Filters) { filter = fTemp; WIA.IProperty prop = filter.Properties.get_Item(ref convert); prop.set_Value(ref formatstring); } #endregion #region Image-Scan + Convert img = itm.Transfer(scanFormat.ScanFormat) as WIA.ImageFile; img = ip.Apply(img as WIA.ImageFile); img.SaveFile("D:\\scan2." + img.FileExtension); Image image = Image.FromFile("D:\\scan2." + img.FileExtension); ilImages.Images.Add(image.ToString(), image); alImages.Add(image); if (ImageScanned != null) { ImageScanned(image); } #endregion can anyone help us with this problem? thanks

    Read the article

  • Moq a function with 5+ parameters and access invocation arguments.

    - by beerncircus
    I have a function I want to Moq. The problem is that it takes 5 parameters. The framework only contains Action<T1,T2,T3,T4> and Moq's generic CallBack() only overloads Action and the four generic versions. Is there an elegant workaround for this? This is what I want to do: public class Filter : IFilter { public int Filter(int i1, int i2, int i3, int i4, int i5){return 0;} } //Moq code: var mocker = new Mock<IFilter>(); mocker.Setup(x => x.Filter( It.IsAny<int>(), It.IsAny<int>(), It.IsAny<int>(), It.IsAny<int>(), It.IsAny<int>(), It.IsAny<int>()) .Callback ( (int i1, int i2, int i3, int i4, int i5) => i1 * 2 ); Moq doesn't allow this because there is no generic Action that takes 5+ parameters. I've resorted to making my own stub. Obviously, it would be better to use Moq with all of its verifications, etc.

    Read the article

  • CodePlex Daily Summary for Thursday, March 18, 2010

    CodePlex Daily Summary for Thursday, March 18, 2010New ProjectsBordecal tools for FxCop: Bordecal tools for FxCop provides an extended framework for FxCop rule development. It allows rule developers to avoid using embedded XML resource...DotNetNuke® Skin City: A DotNetNuke Design Challenge skin package submitted to the "Personal" category by allsnnskins. We integrate orange color and black colour in this ...DotNetNuke® Skin Dawn: A DotNetNuke Design Challenge skin package submitted to the "Out of the box" category by allsnnskins. This design reflects the theme of daylight. U...DotNetNuke® Skin Dream: A DotNetNuke Design Challenge skin package submitted to the "Personal" category by WhNuke. Uses the DNNJDMenu skin object.DotNetNuke® Skin Expression: A DotNetNuke Design Challenge skin package submitted to the "Out of the box" category by Salar Golestanian of SalarO. This is a pure CSS skin with ...DotNetNuke® Skin ModernBiz: A DotNetNuke Design Challenge skin package submitted to the "Modern Business" category by allsnnskins. This simple and unaffected company skin uses...DotNetNuke® Skin Profound: A DotNetNuke Design Challenge skin package submitted to the "Modern Business" category by WhNuke Technology. This skin is simple and clean and the ...DotNetNuke® Skin Technology: A DotNetNuke Design Challenge skin package submitted to the "Modern Standards" category by allsnnskins. It's compatible with common browsers such ...DotNetNuke® Skin Unravel: A DotNetNuke Design Challenge skin package submitted to the "Modern Business" category by Salar Golestanian of SalarO. This is a pure CSS skin wit...E! - ECMAScript Runtime Environment: E! (pronounced E-Bang) is a lightweight runtime environment for editing basic ECMAScript scripts with access to .NET Framework class libraries.Easy ArcGIS Library: Easy ArcGIS Library is a set of C# .net classes that wrap the common functionality of ArcObjects, that help ArcGIS developers do a lot of common fu...File Categorizer: The File Categorizer will help people tag the files on their system for easy searching. Instead of keyword searches, you can find files based on v...GMFS Cosmos: This is a file system for Cosmos a OS that was built with C# and we will be implementing this for windows and linuxIFilter Core Implementation (interface and structures): IFilter C# implementation for you to embed when writing Windows Search capabilities into your application.Image Wall Control for Silverlight: A control for Silverlight that emulates the wall of images in the Zune. imenik_za _dev4fun: imenik is a very simple program and easy to use where you can save and organise your contacts.LegoPhysX: LegoPhysX is an atomic based physics enginePersonal Accounting: Personal system for managing financial accounts, which supports multiple accounts in different currencies. It has movement imputation and basic que...Pipes & Filters Engine: The Pipes & Filters Engine allows you to process a sequence of separate operations (filters) asynchronously in a multi-threaded manner. Filters wil...Prerequisites Checker: Check preqrequisites for software. Example: Software S1 is delivered. S1 has prerequisites PR1, PR2... PRN You may load the config file for S...Puzzle Lib: A library for creating grid-and-tile puzzles. Includes two separate UIs for the Tetriminoes puzzle as examples.QuotesPlugin for Windows Live Writer: The QuotesPlugin for Windows Live Writer lists quotes from web sites such as quotes4all.net. It's very easy for you to select your favourite ones a...RobiJ2se: Robi j2se Learning!SkinEngine: This is a Skin Framework for C# Winform, It use easy.and Create Skin GreatSQL Azure .NET Connection: This is a demo application that shows how to connect with SQL AzureSupermarket Soft: WPF Application that helps you manage your supermarket shoppings.Tally Marks for Windows Phone 7 Series: Tally Marks is a counting application. It can count almost anything you'd like to count, and it does it with tally marks! Count the number of peo...TwitCast: TwitCast is a simple notifier for Twitter using the [url:http://linqtotwitter.codeplex.com/] LINQ 2 Twitter library.WodnySwiat: Projekt grupowy wodny światWSS Task Manager Activity: A custom task creation activity that can be used in a sequential or state machine workflow. The activity was specifically developed to handle task ...New ReleasesAddress Book: Address Book: Address BookAutoAudit: AutoAudit 1.10c: Veresion 1.10 includes most of the bug fix requests. adds createdby and modifiedby columns to the audited base tables. If the user name is set by...blog for umbraco 4: Blog 4 Umbraco 2.0.26: Fixes: -Regex bug in base -Directory urls and rss link bug -Open reader bug -Rss bugDotNetNuke® Skin City: City Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Personal" category by allsnnskins. We integrate orange color and black colour in this ...DotNetNuke® Skin Dawn: Dawn Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Out of the box" category by allsnnskins. This design reflects the theme of daylight. U...DotNetNuke® Skin Dream: Dream Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Personal" category by WhNuke. Uses the DNNJDMenu skin object.DotNetNuke® Skin Expression: Expression Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Out of the box" category by Salar Golestanian of SalarO. This is a pure CSS skin with ...DotNetNuke® Skin ModernBiz: ModernBiz Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Modern Business" category by allsnnskins. This simple and unaffected company skin uses...DotNetNuke® Skin Profound: Profound Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Modern Business" category by WhNuke Technology. This skin is simple and clean and the ...DotNetNuke® Skin Technology: Technology Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Modern Standards" category by allsnnskins. It's compatible with common browsers such a...DotNetNuke® Skin Unravel: Unravel Package 1.0.0: A DotNetNuke Design Challenge skin package submitted to the "Modern Business" category by Salar Golestanian of SalarO. This is a pure CSS skin with...E! - ECMAScript Runtime Environment: E! beta 1: This is really meant as a learning project for playing with dynamically compiled code, so you'd be better off getting the source code.Easy ArcGIS Library: EAGL Binaries: Easy ArcGIS Library Last Build (Version 1.1.2.4139)Easy ArcGIS Library: EAGL Binaries And Documentation: EAGL Latest Build With DocumentationEasy ArcGIS Library: EAGL Documentation: EAGL 1.1.2.4139 DocumentationEnterprise Library Extensions: Release 1.1: This is a service release for version 1.0 The installation process now works as intended. The assemblies are now visible in the Visual Studio As...Family Tree Analyzer: Version 1.2.1.0: Version 1.2.1.0 Fixed GB radio button not working renamed UK Added fixes for UK regions/shires/counties where country is missing Add country reco...Family Tree Analyzer: Version 1.3.0.0: Version 1.3.0.0 Added IGI Search results viewer Tweaked filenames of IGI search so that results window has more informative displayFile Archive: File Archive: If your computer is only word processing machine or document merge machine, this program is really fit for you. It's so...o useful! This program ar...GameStore League Manager: League Manager 1.0.4: Fixes bug 7434. Changed version number to the standard format of Major.Minor.ReleaseIFilter Core Implementation (interface and structures): Stable release: First release of interface implementation.IFilter Core Implementation (interface and structures): System.Search.Core: Ifilter interface for implementation in your own Search Providers.imenik_za _dev4fun: imenik_aplikacija: imenik aplikacija is an application easy to use where you can save and organise your contacts.KDRE - kernel debugger regular expression extension: KDRE 0.0.2: KDRE - Windbg regexp extension Changes: - amd64 build addedMapWindow6: MapWindow 6.0 msi (March 17): This release introduces some minor tweaks to the source code exposing more buffering functionality. This also fixes a problem with selecting point...MockingBird: MockingBird_2.0_RC: This is the V2.0 RC release. The documentation includes notes about the WCF components. Check this blog post for more details about the release. ...MPF for Projects - Visual Studio 2010: Visual Studio 2010 - Final Release: This contains the source code for the release of MPF for Projects corresponding to Visual Studio 2010. For Beta 2, you will need the Beta 2 release...Physics Helper for Silverlight, WPF, Blend, and Farseer: PhysicsHelper 3.0.0.4 ALPHA: This is an initial release that supports Windows Phone 7 Series Development, along with the Silverlight 3 and WPF support. It requires Visual Studi...Pocket GPW: Pocket GPW 1.2: Modyfikacje wg. change set-a 56678. Poprzednia baza danych (z wersji 1.1) jest zgodna z aktualną. Przed instalacją skopiuj poprzednią bazę danych ...Prerequisites Checker: Prerequisites Checker: Check your software prerequisitesPuzzle Lib: Puzzle Lib examples: Tetriminoes examples using a common Puzzle LIB and common Puzzle Implementation library, demonstrating a basic MVC architecture for game developmentRoTwee: RoTwee (8.0.7.0): Now you can rotate tweets by your hand !SharePoint Icon Integration: SharePoint Icon Integration PDF: This is the first stable release of the SPIconIntegration. To install the PDF Icon integration just start the setup.exe file that you will find in ...SkinEngine: SkinEngine-Src-2010-03-17: this is a release on 2010-03-17Spell Corrector: Spell Corrector 0.2 Binary: Fixed a bug in the word indexing in the database.Spell Corrector: Spell Corrector 0.2 Code: Fixed a bug in the indexing of the words in the database. Now insertion of new words in the database is faster.SQL Azure .NET Connection: LittleBlackBook.NET Release 1.0: This was a demo project for a SQL Azure Presentation at ConfooSQL Server Extended Properties Quick Editor: New release 1.5.5: Whats new: Move preferences to application settings and add a form to edit preferences. Support to add, modify and delete operations could be made ...SuperModel - A Dynamic View-Model Generator: 1.0.0.1 - Tyra+: Resolving a couple of bugs; models generated using INotifyPropertyChanged were not being created correctly. Property resolution on proxied types w...Survey - web survey & form engine: Survey 1.2.0: The Survey 1.2.0 release is based on the original sources of the Nsurvey 1.9 application. Compared to the Survey 1.1.0 version many new features ...T.S.T. the T-SQL Test Tool: Version 1.5: Version 1.5 changes: Bug fix. In V1.4 and earlier table comparison failed if the tables compared had columns with spaces in them.TwitCast: TwitCast 1.0.0.0: First release of TwitCast. Be warned that this is just a development release and there are a lot of things that remain to be done.unbinder: Unbound.dll: from change set ef6f2303dd32VCC: Latest build, v2.1.30317.0: Automatic drop of latest buildWatchersNET.TagCloud: WatchersNET.TagCloud 01.02.00: Whats New Show only Tags from Pages the Current User has View Acess (As Option) A Url can be specified for a Custom tag Added Module Package fo...WSS Task Manager Activity: 1.0: Download either the source for Moss Task Manager Activity, Workflow sample if you are interested to see how to use the activity in the workflow or ...XML pretty print for python (xmlpp): version 0.92b: Fixes issues when element name contains :Xpress - ASP.NET MVC 个人博客程序: xpress2.1.1.0317.beta: 最新beta版 更改内容: 模板与系统所需配置文件移动到App_Data中 Service对象注入到Controller中 Controller对象放入IOC容器中 邮件发送BUG修正Most Popular ProjectsMetaSharpRawrWBFS ManagerSilverlight ToolkitASP.NET Ajax LibraryMicrosoft SQL Server Product Samples: DatabaseAJAX Control ToolkitLiveUpload to FacebookWindows Presentation Foundation (WPF)ASP.NETMost Active ProjectsLINQ to TwitterRawrOData SDK for PHPDirectQOpen Data App Framework (ODAF)patterns & practices – Enterprise LibraryBlogEngine.NETjQuery Library for SharePoint Web ServicesMapWindow6NB_Store - Free DotNetNuke Ecommerce Catalog Module

    Read the article

  • Sharepoint .PDF contents displaying as 'searchtext.xml' in searches

    - by Green Muffins
    Hi Experts, I recently used installed ifilter in my sharepoint farm to enable searching of the contents of .pdf documents. All went well, except if I search for contents of any .pdf file, they appear in the search results with document title "searchtext.xml", and the link to the document gives a giant page of the .pdf contents in an .xml looking browser page. :s I have added .pdf filetypes to the search, so I am unsure why it is reading them incorrectly.. if I search for a .pdf document title such as 'document.pdf' it will display the result as a html page, though the link does follow to a readable .pdf file. Any help?

    Read the article

  • CS1685 Warning causes a CS0433 error when targeting 3.5 in VS2010

    - by Adam Driscoll
    I have a 2010 project that is targeting .NET v3.5. It was working fine until I started to mess with configurations a bit and now I cannot figure out what I'm doing wrong. The project doesn't have ANY references added. It won't even let me add a reference to System.Core as it is added by the 'build system'. warning CS1685: The predefined type 'System.Func' is defined in multiple assemblies in the global alias; using definition from 'c:\Windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll' IFilter.cs(82,49): error CS0433: The type 'System.Func' exists in both 'c:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll' and 'c:\Windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll' Looks like something is grabbing onto 4.0 but I'm not quite sure how to fix it. Any one else run into this?

    Read the article

  • Document Stored in File System Text Searching and Filtering required in ASP .Net Application

    - by Harryboy
    Hello Experts, We are building a jobsite application in which we will store resumes of all the candidates, which is planned to store on file system. Now We need to search inside that file and provide the result to the user, we need to provide that what is the best solution to implement text searching. I have just tried to identify it and got some reference like IFilter (API or interface) and Lucene.Net (open source), but not sure that is it a right solution. In initial phase it is expected to be around 50,000 resumes and it should be scalable enough if number increases. I just want some case study or some analysis or your suggestions that which is the best method to handle this requirement (Technology ASP .Net) Thanks

    Read the article

  • SharePoint OCR image files indexing

    Introduction This article describes how to setup indexing of the image files (including TIFF, PDF, JPEG, BMP...) using OCR technology. The indexing described below utilizes Microsoft IFilter technology and as such is not specific to SharePoint, but can be used with any product that uses Microsoft indexing: Microsoft Search, Desktop search, SQL Server search, and through the plug-ins with Google desktop search. I however use it with Microsoft Windows SharePoint Services 2003. For those other products, the registration may need to be slightly different. Background  One of the projects I was working on required a storage of old documents scanned into PDF files. Then there was a separate team of people responsible for providing a tags for a search engine so those image documents could be found. The whole process was clumsy, labor intensive, and error prone. That was what started me on my exploration path. OCR The first search I fired was for the Open Source OCR products. Pretty quickly, I narrowed it down to TESSERACT (http://code.google.com/p/tesseract-ocr/). Tesseract is an orphaned brain child of HP that worked on it from 1985 to 1995. Then it was moved to the Open Source, and now if I understand it correctly, Google is working on it. With credentials like that, it's no wonder that Tesseract scores one of the highest marks on OCR recognition and accuracy. After downloading and struggling just a bit, I got Tesseract to work. The struggling part was that the home page claims that its base input format is a TIFF file. May be my TIFFs were bad, but I was able to get it to work only for BMP files. Image files conversion So now that I have an OCR that can convert BMP files into text, how do I get text out of the image PDF files? One more search, and I settled down on ImageMagic (http://www.imagemagick.org/). This is another wonderful Open Source utility that can convert any file into image. It did work out of the box, converting any TIFF files into bitmaps, but to get PDF files converted, it requires a GhostScript (http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/gs864/gs864w32.exe). Dealing with text PDFs With that utility installed, I was cooking - I can convert any file (in particular PDF and TIFF) into bitmap, and then I can extract the text out of the bitmap. The only consideration was to somehow treat PDF files containing text differently - after all, OCR is very computation intensive and somewhat error prone even with perfect image quality and resolution. So another quick search, and I have a PDFTOTEXT (ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip) - thank God for Open Source! With these guys, I can pull text out of PDF in an eye blink. However, I would get nothing for pure image PDFs, but I already have a solution for that! Batch process It took another 15 minutes to setup a batch script to automate the process: Check the file extension If file is a PDF file try to extract text out of it if there is more than certain amount of text in the file - done! if there is no text, convert first page into bitmap run OCR on the bitmap For any other file type, convert file into bitmap Run OCR on the bitmap Once you unzip the attached project, check out the bin\OCR.BAT file. It will create a temporary file in the directory where your source file is with the same name + the '.txt' extension.Continue span.fullpost {display:none;}

    Read the article

  • Visual Studio 2010 Can no longer build .NET v3.5

    - by Adam Driscoll
    I have a 2010 project that is targeting .NET v3.5. Inexplicably I can no longer build v3.5 projects. The project doesn't have ANY references added. It won't even let me add a reference to System.Core as it is added by the 'build system'. warning CS1685: The predefined type 'System.Func' is defined in multiple assemblies in the global alias; using definition from 'c:\Windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll' IFilter.cs(82,49): error CS0433: The type 'System.Func' exists in both 'c:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll' and 'c:\Windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll' Looks like something is grabbing onto 4.0 but I'm not quite sure how to fix it. Any one else run into this? Coworker had this same issue. It took a reinstall of Windows to correct the problem I've opened a bug on this one: https://connect.microsoft.com/VisualStudio/feedback/details/558245/warning-cs1685-when-compiling-a-v3-5-net-application-in-visual-studio-2010 If the compiler is set to verbose I see this: FrameworkPathOverride = C:\Windows\Microsoft.NET\Framework\v4.0.30319 which is defined as: Specifies the location of mscorlib.dll and microsoft.visualbasic.dll. This parameter is equivalent to the /sdkpath switch of the vbc.exe compiler. Some other interesting tidbits: I've created a new project all together and cannot build v3.5 at all. I can build 2.0, 3.0, 3.5 Client Profile, 4.0 and 4.0 Client Profile with no problem. VB.NET can build v3.5 but C# cannot. I've tried a reinstall of .NET 3.5, 4.0 and Visual Studio 2010 with no success. Visual Studio debug logs shown nothing interesting and Safe Mode does not work. Trying to avoid a Windows reinstall...

    Read the article

  • How do I add a where filter using the original Linq-to-SQL object in the following scenario

    - by GenericTypeTea
    I am performing a select query using the following Linq expression: Table<Tbl_Movement> movements = context.Tbl_Movement; var query = from m in movements select new MovementSummary { Id = m.DocketId, Created = m.DateTimeStamp, CreatedBy = m.Tbl_User.FullName, DocketNumber = m.DocketNumber, DocketTypeDescription = m.Ref_DocketType.DocketType, DocketTypeId = m.DocketTypeId, Site = new Site() { Id = m.Tbl_Site.SiteId, FirstLine = m.Tbl_Site.FirstLine, Postcode = m.Tbl_Site.Postcode, SiteName = m.Tbl_Site.SiteName, TownCity = m.Tbl_Site.TownCity, Brewery = new Brewery() { Id = m.Tbl_Site.Ref_Brewery.BreweryId, BreweryName = m.Tbl_Site.Ref_Brewery.BreweryName }, Region = new Region() { Description = m.Tbl_Site.Ref_Region.Description, Id = m.Tbl_Site.Ref_Region.RegionId } } }; I am also passing in an IFilter class into the method where this select is performed. public interface IJobFilter { int? PersonId { get; set; } int? RegionId { get; set; } int? SiteId { get; set; } int? AssetId { get; set; } } How do I add these where parameters into my SQL expression? Preferably I'd like this done in another method as the filtering will be re-used across multiple repositories. Unfortunately when I do query.Where it has become an IQueryable<MovementSummary>. I'm assuming it has become this as I'm returning an IEnumerable<MovementSummary>. I've only just started learning LINQ, so be gentle.

    Read the article

1