Using IFilter in C# by bypassing COM

Posted on March 12th, 2006.

I’ve been using IFilters in a C# application I’m working on, and it hasn’t been fun at all. There are all kinds of problems with COM threading and then there are some malfunctioning filter implementations.

Well, I decided one day to get to the bottom of these problems and finally created my own implementation of the LoadIFilter function.
The LoadIFilter function is used to find a filter implementation for a certain file. My implementation does what LoadIFilter does (and a bit more), but it does not involve COM in the process and avoids the threading problems mentioned above. Until now, it hasn’t introduced new problems..

Anyway, I packaged all that information (and source code) in an article and posted it to Code Project. You can find it here.  It has some nice information on how to dynamically load a dll and call a function pointer using GetProcAddress (which was not possible before .Net 2.0).

Hope you’ll find it useful.

UPDATE: The article moved to it’s permanent location after being edited. Link updated.

Technorati: , , ,

Make a Comment

15 Responses to “Using IFilter in C# by bypassing COM”

RSS Feed for Eyal’s Posts Comments RSS Feed

re: Visio 2003 IFilter and MTA…

Hey Eyal,

First, thanks for the code. We are testing your code in our site and are getting:
System.Runtime.InteropServices.COMException (0×80030050): already exists. (Exception from HRESULT: 0×80030050 (STG_E_FILEALREADYEXISTS))
at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
at EPocalipse.IFilter.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
at EPocalipse.IFilter.FilterReader..ctor(String fileName)

Which is strange as you say there are no COM objects used.

ANy help will be appreciated.

Alon

Alon
January 3rd, 2008

The library bypasses the COM infrastructure used to load and create the COM object, but since the interfaces are COM based the errors you see are still COM errors.
The error you get usually happens when trying to open a file which is already open.

Eyal Post
January 8th, 2008

chalom Eyal
how can I program in C# or Java a global image filter .
I mean like Ifilter for all the system.
If I want to see any image or video or Flash coming to the screen according the built filter (GUID or any image filter built in C# or Java.
Toda
Zeev

zeev
May 11th, 2008

Hi,
First off, thanks for posting this example. I’m trying to use your posted library to scan a collection of documents. Everything works fine for the first file in the collection, but then on the second file i get an error “already exists. (Exception from HRESULT: 0×80030050 (STG_E_FILEALREADYEXISTS))” do you have any pointers on how to get around this? Thanks! Allan

Allan
June 10th, 2008

Hi Eyal,
Thanks for posting the code!
I’m trying to use your IFilter to extract text from different file types. It work good on word and text. I’m having some problems with the excel files. The text that I receive has the strings in different order. First I receive the text and then the numbers…sounds strange but…And with other implementation of IFilter I get the same thing.
With .csv files I get i get the error: “already exists. (Exception from HRESULT: 0×80030050 (STG_E_FILEALREADYEXISTS))”.
Any help is appreciated.
Thank you!
Doru

Doru
June 12th, 2008

I can’t seem to find the source zip anymore at:
http://www.codeproject.com/KB/cs/IFilter.aspx

Could you tell me where I might download it from, or send me a copy?

Thanks!
John
Massachusetts, US

JohnH
August 26th, 2008

The zip file source is back today. Thanks.

JohnH
August 26th, 2008

“Всегда приятно читать умных людей”

Alex K
December 3rd, 2008

Hey Eyal, I need to know what all properties(e.g. author,date created) are supported by each IFilter(e.g. .docx,.xlsx) in FIlter Pack 2007. Do you have any idea about it?

UJ
January 7th, 2009

Hey Eyal,

Thanks for the code. I am using your code for my search solution and ran into a little issue when I tried to modify it. I only need to parse PDF and both Office 2003 and 2007 document types. Instead of looking up IFilter in registry by extension, I found what IFilters are used by my local system using IFilter Explorer software, copied DLLs into bin directory of my project and hardcoded Persistent Handler Addin values in my code. Then I matched extensions to DLL location and persistent handler value and extracted content using these hardcoded values. Everything worked perfectly fine till I tried to run the same code on another machine (both machines running Vista). I could no longer extract content on this other machine even though I used the same IFilter DLL files and the same Persistent Handler values. Does Persistent Handler DLL specific or installation/workstation specific? What\’s the difference between \”Persistent Handlers\” and \”Persistent Handlers Addins Registered\” values? I see both of these values in the IFilter Explorer next to DLL name and file extension, but don\’t know what they are used for and how are they different? According to your code, you use \”Persistent Handlers Addins Registered\” value in order to load filter from DLL. I did the same.

I\’ve been stuck on this problem for almost a week now and you are my best chance to get it resolve. At least point me in the right direction. All I am trying to do is use the latest IFilter DLLs in my project instead of hoping every machine has the latest IFilter DLL versions for file types that I need.

Thanks again for your code and time,

Ilia

Anonymous
April 10th, 2009

hi Eyal

thanks for you code. there is slight change in my requirement i.e, i need to extract the page numbers in pdf file while extracting text from pdf file.. pls could u help me on this.

thanks a bunch.

Sanjay
April 21st, 2009

hi Eyal

thanks for you code. there is slight change in my requirement i.e, i need to extract the page numbers in pdf file while extracting text from pdf file.. pls could u help me on this.

kiran
December 16th, 2009

I just posted to CodeProxect a patch to fix Adobe IFilter DLL load problem.
You can find it here:
http://www.codeproject.com/Messages/3414291/Re-pdf-files-ifilter-load-fails-for-AcroRdIF-dll-i.aspx

Thanks for your work.

Claudio
March 24th, 2010

Hi Eyal,
I just wanted to say thank you for making this code available to developers. I had written some code then whilst looking for a solution to some issues I came across yours which has saved me a lot of time.
All the best
Gary Lee

Gary
April 15th, 2010

Where's The Comment Form?

eXTReMe Tracker