Friday, September 4, 2015

Automatic Restore from Nuget

Package managers like Npm,NuGet etc. in short span of time became the first choice for managing any external/internal dependency. It not only makes resolving dependency easy but also imposes to best practice like versioning. If you are still skeptical about it, I recommend open your visual studio and give it a try. Here is a nice article about it https://docs.nuget.org/create/hosting-your-own-nuget-feeds

I spent quite an amount of time figuring the best way to use Nu-Get and how can we make hassle free.I don't recommend throwing all your internal dependency as NuGet package rather you should divide your application into independent components and if each component is candidate for independent development and can have multiple versions in future then NuGet is your best bet.

Working with NuGet in visual studio is very easy, all you have to do is add your local nuget server address in available package source and use it to resolve any dependency.




When any nuget reference is added to a solution, Nuget will maintain its own artifacts in solution and project level.
Project level will contain only one file called package.config which contain the dependent package information like version,framework,id.

Solution level will contain a hidden folder of .nuget which contain
  1. NuGet Config 
  2. NuGet.exe (Can be excluded from source control)
  3. NuGet.targets


.NuGet folder is important if you want to have automatic restore - which means when you build visual studio will try to look for external dependencies and in case its not able to locate it will initiate nuget.exe application to resolve them using details from configurations present either in NuGet.config in solution directory or %ProgramData%\NuGet\Config\.Nuget also maintain its own local cache @ %LOCALAPPDATA%\NuGet\Cache\ and in case if its able to find the NuGet package in local cache it won't go out to probe your NuGet feed server.

So if we want automatic restore this .nuget folder is of great significance.I think the best way to maintain a coherent restore mechanism is by defining the configurations as part of NuGet config file and add it to source control. When other member downloads a version of code he will have exact same configuration to restore which will save lot of hassles.To enable automatic restore you can either go to solution right click and there will option of Enabling automatic restore or you can do via configuration

So two important thing to add in configuration to enable automatic restore, is  add true key for package restore  and add NuGet server address optionally, in-case you want to use your own server.





This will allow us to force the configuration for everyone who downloads the code.To make this more fun we can automate the NuGet restore by creating a small batch file.


 @echo off  
 REM Check the nuget application is installed as part of Local App Data   
 SETLOCAL  
 SET CACHED_NUGET_LOCATION=%LocalAppData_LOCATION%\NuGet\NuGet.exe  
 REM If installed then copy to local project folder   
 IF EXIST %CACHED_NUGET_LOCATION% goto copynuget  
 REM Download the file   
 echo Downloading latest version of NuGet.exe...  
 IF NOT EXIST %LocalAppData_LOCATION%\NuGet md %LocalAppData_LOCATION%\NuGet  
 REM - This is command which need the powershell latest version so discontinue it as many CDK systems run old window 7   
 REM powershell -NoProfile -ExecutionPolicy unrestricted -Command "$ProgressPreference = 'SilentlyContinue'; Invoke-WebRequest 'https://www.nuget.org/nuget.exe' -OutFile '%CACHED_NUGET_LOCATION%'"  
 powershell -NoProfile -ExecutionPolicy unrestricted -Command "$ProgressPreference = 'SilentlyContinue'; (New-Object System.Net.WebClient).DownloadFile('https://www.nuget.org/nuget.exe', '%CACHED_NUGET_LOCATION%')   
 REM Copy the nuget to local directory   
 :copynuget  
 echo NuGet.exe exists.Initiating package restore.....  
 IF EXIST .nuget\nuget.exe goto restore  
 md .nuget  
 copy %CACHED_NUGET_LOCATION% .nuget\nuget.exe > nul  
 REM Run package restore command   
 :restore  
 echo Deleting cached package .....  
  del %LocalAppData_LOCATION%\NuGet\Cache\*.nupkg /q  
  echo Initiating package restore.....  
 .nuget\NuGet.exe restore -ConfigFile ".nuget\NuGet.Config"  
 echo Success!!!  


Place this batch file in solution root folder in same path of .NuGet and execute this. Remember to not add package folder to source control,it will defy the whole purpose of adding using NuGet.

Okay so guys enjoy flexibility!

Update - Newer versions of msbuild already have the nuget restore feature out of the box so if you need to do restore during build. Just enable NuGet Restore and you are good.

Tuesday, May 5, 2015

Parse Large Flat File using Hash Table


Recently I stumbled on very interesting requirement which was simple and tricky at same time.Let me outline the objective for ya.
Requirement is to have a flat file reader tool which has ability to parse the flat file (can be of any number of flat files but for each file, the code should be extensible with minimum code). Now the flat file format is fixed for now and each line starts with a key, and then set of values separated with comma. Objective is to read a particular key from user and for that key, display all the records present in all the flat files. There can be multiple files and the file size can grow.

To solve this type of requirement, I zeroed on below approaches-

  1. On-Demand Parsing : This approach is simplest and parses the data source, once the customer is requested
  2. Index based Parsing: This approach builds the index for data source and provides Random retrieval of data from data source.
  3. Index based parsing and Memory Mapped file – This approach has all the feature of the above parsing and additionally maps the section of the file in memory. This is used for very huge files.
Analyze - Option 1  

At first glance problem seems to be very simple and boils down to parsing the key and, finding the set of values for the key and repeat it, till you find the end of file. However there is serious issues, for every query, file has to be parsed completely, as there can be more than one value. This is serious problem as it performs same task repeatedly which is not only optimum but a waste of CPU, memory and time :). Also with file growing in size, the time taken with increase, so I quickly discarded this naive approach and looked at other options available.

Analyze - Option 3  

Option 3 is the best approach when dealing with GB of data. In this approach we can’t bring the entire file in memory as it will be too huge, so jumping to a particular index of record or offset of records is not very good approach. This will involve dividing the entire file into Virtual Bucket of records and storing the index/offset of a particular record with respect to the bucket starting position. This will help to open a specific portion of file as filestream and get it mapped to memory and reading the data for the particular index.This approach will make more sense for very large files but my requirement is not for very large files but rather for marginally large file of order ~1GB.



Analyze - Option 2

This seemed to be the best option as my file size is not or order >1Gb. I re-factored my approach to option 2 and used hash table or Dictionary (Hash table implementation in C#).Idea was to parse the whole file once and let it index all the records available in the flat files using record offset. This will allow the query to look for index once and find the values straight, rather than scanning file repeatedly.



Index Based parsing 

 Index Structure


I decided to have index like -

 Index<int,List<FilePointers>>

My index will have dictionary with key (Int32) and Values (list of File Pointers or Long).
This Index will maintain the key as CustomerId and Values as list of all the index which points to the record for that key. So approach was to read the files and maintain an offset location for that key or customer id.

File Scanning or Index Building Logic

File scanning logic is simply reading each records and extract the key from the record and add the key and record position as part of dictionary. If any record with same key id found then add the record position in filepointer list.
  • Start reading the file from index 0  and extract each line
  • Split the line into various comma separated values and extract the value to be defined as key
  • Add the key and the record position as list in the index


Retrieval Logic

While retrieving jump straight to location of offset and retrieve the complete line and display it to user. This will work well if index is built prior to query but if user doesn’t opt to build the index or record was added after the index is build (addition of records with new key), in those cases we need to fallback to eager loading or rescan the file for the record.
1
  • For a particular key requested, check in Index for the value
  • If value found, retrieve the value from each file for the particular key and return to user
  • If value not found, run Rescanning logic for the particular key

Rescanning logic

Rescanning logic will be used in eager loading when index doesnot contain the key. It will invoke the same logic which was used in Scanning but with a limit of records. There may be a file where key is primary key which means we can have only one record per key or a limitation that for 1 key we can have maximum 10 records. So for a key to record mapping constraint, it better to look for only the maximum number of records in file rather than scanning whole file. So my logic goes like this.
1.       
  • Define FileMaxRecordPerKey value(max number of records a file can have for particular key), if undefined assume to be Int.Max
  • Scan the file for key and for every record found increment the counter by 1
  • If the CounterValue is equal to the FileMaxRecordPerKey, stop further scan
  • Add the key and Value to the index and return to user

Nutshell

  • Tool maintains the index for every record stored in all flat files in hashtable .This index is in-memory and will be recreated on start of tool
  • Tool provides the functionality to prebuild the index or ignore the index building and fallback to  on demand parsing
  • For Fallback scenarios where on demand parsing is used, it will creates the index for requested record. Which allows any subsequent query for the id to be less expensive
  • For Fallback cases the files will be scanned completely but in cases when record number is limited, further scanning is not required once we have found the maximum number of record. This is implemented as FileMaxRecordPerKey property
  • For any record not found in prebuilt index, it will try to locate using on-demand parsing
  • Dynamic addition of record in any of flat file is not supported in above logic and there is no implementation of dirty index
For a working source control checkout Git repo -  https://bitbucket.org/anupkumarsharma/flatfileparser/src






Tuesday, June 17, 2014

Extract Property Value from Object/Object Graph

The below code should be able to parse the object for respective property value and return back the value/null as response.Below type of string literals can be parsed with the below logic



            string s = "P[0].A";
            string g = "T[0].U.D";
            string t = "S[0].P[7].D";
            string NegativeValue = "P[0].WRONGVALUE";




public static Object GetPropertyValue(String name, object obj, Type type)
        {

            var isArray = false;

            var parts = name.Split('.').ToList();
            var currentPart = parts[0];
            var backup = parts[0]; ;

            int index = 0;
            if (currentPart.Contains(ArrayIdentifier))
            {
                // replace with some swift logic here -
                index = Int32.Parse(currentPart.Substring(currentPart.IndexOf('[')).Replace('[', ' ').Replace(']', ' ').Trim());
                currentPart = currentPart.Substring(0, currentPart.IndexOf('['));

            }

            PropertyInfo info = type.GetProperty(currentPart);

            if (info == null) { return null; }

            if (info.PropertyType.GetInterface("IEnumerable") != null)
            {
                int itemNb = 0;
                foreach (object item in (IEnumerable)info.GetValue(obj, null))
                {
                    if (itemNb == index)
                    {
                        parts.Remove(backup);
                        return GetPropertyValue(String.Join(".", parts), item, item.GetType());

                    }

                    //displayObject(item, displaySubObject, item.GetType);
                    itemNb++;
                }
                // index is not in range of the values provided
                throw new ArgumentOutOfRangeException();
            }

            if (name.IndexOf(".") > -1)
            {
                parts.Remove(backup);
                //if(!isArray)
                return GetPropertyValue(String.Join(".", parts), info.GetValue(obj, null), info.PropertyType);
                //else
                //    return GetPropValue(String.Join(".", parts), info.GetValue(obj, null), info.PropertyType,index);
            }
            else
            {
                if ((info.PropertyType.IsValueType || info.PropertyType.Equals(typeof(string))))
                {
                    return info.GetValue(obj, null).ToString();
                }

                else
                {
                    //return info.GetValue(obj, null).ToString();
                    return (null);
                }

            }
        }