Image extraction using iTextSharp in C#.net

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Image extraction using iTextSharp in C#.net

hatem hatem
After I search a lot website, i found this code (C#.net):
 
//begin code
 
        public static void ExtractImagesFromPDF(string sourcePdf, string outputPath, int startIndex)
        {
            int index = startIndex;
            // NOTE:  This will only get the first image it finds per page.
            PdfReader pdf = new PdfReader(sourcePdf);
            RandomAccessFileOrArray raf = new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf);
            try
            {
                for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
                {
                    PdfDictionary pg = pdf.GetPageN(pageNumber);
                    // recursively search pages, forms and groups for images.
                    PdfObject obj = FindImageInPDFDictionary(pg);
                   
                    if (obj != null)
                    {
                        int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
                        PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
                        PdfStream pdfStrem = (PdfStream)pdfObj;
                        byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
                        if ((bytes != null))
                        {
                            using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
                            {
                                try
                                {
                                    memStream.Position = 0;
                                    System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                                    // must save the file while stream is open.
                                    if (!Directory.Exists(outputPath))
                                        Directory.CreateDirectory(outputPath);
                                    string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", index));
                                    System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
                                    parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
                                    System.Drawing.Imaging.ImageCodecInfo jpegEncoder = GetImageEncoder("JPEG");
                                    img.Save(path, jpegEncoder, parms);
                                    index++;
                                }
                                catch (Exception ex) {  }
                            }
                        }
                    }
                   
                }
            }
            catch(Exception ex)
            {
                MessageBox.Show(ex.Message+"\n\n"+ex.StackTrace);
            }
            finally
            {
                pdf.Close();
                raf.Close();
            }
        }
       
        private static PdfObject FindImageInPDFDictionary(PdfDictionary pg)
        {
            PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
            PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
            if (xobj != null)
            {
                foreach (PdfName name in xobj.Keys)
                {
                    PdfObject obj = xobj.Get(name);
                    if (obj.IsIndirect())
                    {
                        PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
                        PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
                        //image at the root of the pdf
                        if (PdfName.IMAGE.Equals(type))
                        {
                            return obj;
                        }
                        // image inside a form
                        else if (PdfName.FORM.Equals(type))
                        {
                            return FindImageInPDFDictionary(tg);
                        }
                        //image inside a group
                        else if (PdfName.GROUP.Equals(type))
                        {
                            return FindImageInPDFDictionary(tg);
                        }
                    }
                }
            }
            return null;
        }
 
//end code
 
The code above get some images (not all) from PDF document.
I want someone help me to correct it to extract all images.
 
Also I found this code:
 
//begin code:
 
        public static System.Drawing.Imaging.ImageCodecInfo GetImageEncoder(string imageType)
        {
            imageType = imageType.ToUpperInvariant();
            foreach (ImageCodecInfo info in ImageCodecInfo.GetImageEncoders())
            {
                if (info.FormatDescription == imageType)
                {
                    return info;
                }
            }
            return null;
        }
 
        public int ExtractImages(String sourcePdf, string outputPath)
        {
            int imageNumber = 1;
            int i = 0;
            ImageCodecInfo jpegEncoder = GetImageEncoder("JPEG");
            iTextSharp.text.pdf.RandomAccessFileOrArray raf = null;
            iTextSharp.text.pdf.PdfReader reader = null;
            iTextSharp.text.pdf.PdfObject pdfObj = null;
            iTextSharp.text.pdf.PdfStream pdfStrem = null;
            raf = new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf);
            reader = new iTextSharp.text.pdf.PdfReader(raf, null);
            //System.Console.WriteLine("XrefSize " + reader.XrefSize);
            for (; i < reader.XrefSize; i++)
            {
                try
                {
                    pdfObj = reader.GetPdfObject(i);
                    if (pdfObj != null && pdfObj.IsStream())
                    {
                        pdfStrem = (PdfStream)pdfObj;
                        PdfObject subtype = pdfStrem.Get(iTextSharp..text.pdf.PdfName.SUBTYPE);
                        if (subtype != null && subtype.ToString() == PdfName.IMAGE.ToString())
                        {
                            byte[] bytes = PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)pdfStrem);
                            if (bytes != null)
                            {
                                try
                                {
                                    using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
                                    {
                                        memStream.Position = 0;
                                        System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                                        // must save the file while stream is open.
                                        if (!Directory.Exists(outputPath))
                                            Directory.CreateDirectory(outputPath);
                                        string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", imageNumber));
                                        System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
                                        parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
                                        // GetImageEncoder is found below this method
                                        //System.Drawing.Imaging.ImageCodecInfo jpegEncoder = GetImageEncoder("JPEG");
                                        img.Save(path, jpegEncoder, parms);
                                        imageNumber++;
                                        continue;
                                    }
                                }
                                catch (Exception ex)
                                {
                                    //System.Console.WriteLine("inner " + ex.Message);
                                }
                            }
                        }
                    }
                }
                catch (Exception ex)
                {
                    i++;
                    //System.Console.WriteLine("outer " + ex.Message);
                }
            }
            reader.Close();
            raf.Close();
            return imageNumber - 1;
        }
//end code
 
Please help me to correct any of two ways above to extract all images.
 
My Regards.


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
itextsharp-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itextsharp-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Image extraction using iTextSharp in C#.net

wenbuyi
This post has NOT been accepted by the mailing list yet.
you can try to using xspdf control, it can extract image from pdf and convert pdf to image in c#.net
Loading...