Python 4. 3DPDF’s CSV’s, Pandas & .Exe’s

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

I have a process where I export Revit Models with Shared Parameter Data to a 3DPDF which can be read by anyone with a PDF viewer so that they can 1/ view 3D Geometry of Building Model, 2/ Extract data from 3DPDF to a CSV or XML file for processing.

I have developed an Excel Macro that will take the CSV data from the 3D PDF and order it on separate sheets to show specific data  in the building in an orderly fashion, such as Room, Door, Window, Seating(for Auditoriums/Theatres), Mechanical, Electrical, Plumbing & Fire etc. Specific information is displayed on separate worksheets or can be exported to separate files.

See here & here for process & 3DPDF’s and here for Excel Macro process.

I wanted to test the versatility of Python by replication this process by creating an Executable file that will take a selected file and convert the data from a CSV to an ordered export CSV or XLS file. I decided I’d create a script for supplying a table for ROOM DATA (wall, floor, ceiling substrates & finishes and areas).

Summary

I was able to import a defined file and process/filter room data and export it to a csv file. It looked very similar to the Excel Macro output (not as nicely formatted though, but Data was aligned in the same structure).

I was able to open a dialogue box and select a csv file to be uploaded and processed and could export Room Data to a csv file. There were issues with this process due to the string import process and “,” delimination in the csv file structure

I have not been able, to date, to create an executable file with either package pyInstaller or cx_Freeze.

The process

1/ After exporting the CSV from the 3DPDF

2/ I initially looked at testing the programme, so used the basic Pandas   pd.read_csv to read a pre-selected file (this would be altered after I got the rest of the code working)

df3d = pd.read_csv(‘KaroriLib.csv’)

3/ After getting the data into a dataframe, I needed to 1/ give the columns header names &  2/filter the dataset so I only had the room data:

# Give columns names
col_headers=[‘family’,’property’,’value’]
df3d.columns =col_headers

# Filter to Rooms only in DataFrame
dfRm=df3d[df3d[‘family’].str.contains(‘3dRmTg’)]

with the results as per table below:

I needed to order the dataframe so that all the data pertaining to one room was on one row. If you look in the table above, for each 3dRmTag there are about 23 items with values. To do this I used the pivot command:

Rooms= dfRm.pivot(values=’value’,index= ‘family’,columns= ‘property’)

The consequence of the pivot was I got the 23 rows of data into columns but also got an extra column for each separate room code [00] + 3dRmTg [1506901] . As these columns held no data, I needed to delete them. The issue is that each building may have a different number of rooms, so there could be 2 or 200 empty columns, so I used:

# This gets list of Column Headers with [00] in them (empty columns-) so can use list to get rid of them.
colDel=[col for col in Rooms.columns if ‘[00]’ in col]

#New Dataframe drops empty columns
Rooms2=Rooms.drop(colDel, axis=1, errors=’ignore’)

Then I needed to reorder the columns that I wanted to display and not show columns that I was not interested in (like [00], [01`],[02],[03],[05],[06],…).

So I explicitly found and described each column I wanted :

BlockCode=[col for col in Rooms2.columns if ‘WCC_BlockCode’ in col]
UnitCode=[col for col in Rooms2.columns if ‘WCC_UnitCode’ in col]
RmName=[col for col in Rooms2.columns if ‘3dRmName’ in col]
RmNum=[col for col in Rooms2.columns if ‘3dRmNumber’ in col]
FlrA=[col for col in Rooms2.columns if ‘Floor-Ceiling Area’ in col]
FlrSS=[col for col in Rooms2.columns if ‘WCC_Flr SS-m2’ in col]
FlrF=[col for col in Rooms2.columns if ‘WCC_FlrFin-m2’ in col]
CeiSS=[col for col in Rooms2.columns if ‘WCC_Clg SS m2’ in col]
CeiF=[col for col in Rooms2.columns if ‘WCC_Clg Fin m2’ in col]
WallA=[col for col in Rooms2.columns if ‘Total Wall Area’ in col]
Per=[col for col in Rooms2.columns if ‘Room Perimeter’ in col]
WallSS=[col for col in Rooms2.columns if ‘WCC_Wall SS1-m2’ in col]
WallF=[col for col in Rooms2.columns if ‘WCC_WallFin’ in col]

Unfortunately, this process made a list of 1 of each item, so when I did this:

cols0=[BlockCode,UnitCode,RmName,RmNum,FlrA,FlrSS,FlrF,CeiSS,CeiF,WallA,WallSS,WallF,Per]

I had  cols0= a list of lists (so lots of brackets within brackets, so I could not make Rooms3=Rooms2[cols0]. Python told me NO.

So I had to add all the lists to a list:

cols0=BlockCode+UnitCode+RmName+RmNum+FlrA+FlrSS+FlrF+CeiSS+CeiF+WallA+WallSS+WallF+Per

Then this worked ( I am still confused with lists & Dataframes and their relation with each other- a bit more learning required):

Rooms3=Rooms2[cols0]

I got all the columns in the order that I wanted, so I could export them now to a file, one extra thing I have is the index row which is the ‘family’ column 3dRmTg [1506901] that is still there on export. I have made a file for it to  export to:

#Export CSV
Rooms3.to_csv(‘ExportRms.csv’, sep=’,’, encoding=’utf-8′)

Output file below, note , it still has ‘family’ column as this was index column in DataFrame

As this is just an exercise looking to extract the data, I was happy that I had the data in the order that I wanted.

Below is the export from the same file using the Excel Macro process:

So tabulated results are pretty similar. I am happy with the output.

Issues

On a larger file, for some reason, I’m not sure if its in the 3D PDF export to CSV, or the Python importing/ordering processes, but there seems to multiple columns for a few of the fields, such as BlockCode , UnitCode etc, with some of the data distributed randomly between these columns. I noticed this on the StJames Theatre 3D PDF model. It does not occur in the Excel Macro process that orders them as you’d expect.

To tidy this up you would need to merge these columns together.This would need to be done by some extra coding. Not particularly difficult, and since it would be a compiled programme, once sorted, would make the programme more robust.

Import and export.

I wanted to be able to create an executable programme that allows the user to select a CSV file.

(A simpler way would be just rename the CSV file to a specific name that the programme could just upload)

Since the reason for the exercise was to learn I thought I’d try and do a search for a file. To do this you need to use the Tkinter package. This is already installed in my Python 3.6 version.

I found this video useful for very basics of how to open a file dialogue box in Python.

from tkinter import *
from tkinter import filedialog
root=Tk()
root.fileName= filedialog.askopenfilename(filetypes=((“CSV files”, “*.csv” ),(“All files”, “*.*” )))

#This creats a str of input csv
text1=open(root.fileName).read()

With the tkinter method of choosing a file from a dialogue box it reads it in as a continuous string. The csv file is “,” deliminated and has “\n” break lines.

First of all, I need to replace the “\n” break lines as “,” so there is a consistency. Unfortunately this straight replacement does not work in one step, I had to use 2 steps with an odd string to make sure it was not replicated in the file.

text2=text1.replace(‘\n’,’XXX00′)
text3=text2.replace(‘XXX00′,’,’)
I can then use the split command, using the “,” deliminator to make a list.
text4=text3.split(“,”)

I then divided the list into 3 separate lists that I would use as columns in a Dataframe. The method below for t1  takes the first position and copies every third item into a list, t2 starts at the 2nd position, t3 at the 3rd position.

t1= text4[0::3]
t2= text4[1::3]
t3= text4[2::3]
When I tried to change these to a dataFrame I would sometimes (depending on the csv file I used) get an error pointing out that the 3 lists above were not the same length, which they needed to be to create the dataFrame below. They were usually only out by one number, but this could be in any of the rows,

df1= pd.DataFrame({‘family’ : t1,
‘property’ : t2,
‘value’ : t3
})

So I had to get the minimum length of all the above lists and create new lists that were only as long as the shortest, ( A lot of these different lengths were white space in the file where it ended):

lenMin=min(len(t1),len(t2),len(t3))
t1a=t1[:lenMin]
t2a=t2[:lenMin]
t3a=t3[:lenMin]

I could then use my dataframe code as discussed in the previous part of this post.

Challenges with this method:

This has some issues because with the “select file” method in Tkinter it imports the data in as a continuous string. Using CSV’s is based on a “,” delimination. So if there is, in one of the fields some text such as “The rabbit is white, with a bowtie” then the “,” in that sentence is read as a “,” delimination. So this can skew the data.

This did actually occur in some of the files I tested and I had to go and use FIND & REPLACE to alter these text strings by replacing the “,” with a “-“.

In some cases I had an empty export .CSV file as an example of the text string above occurred before the lines of the Room Data, so these were not filtered, thereby giving an empty set.

So, although it is doable, using this method of file selection has a high risk of breaking. This is with the Tkinter method that imports as a continuous string.  You would need to pre-validate the file and clean it prior to importing it and using this method.

There was a lot of effort changing the import to a DataFrame. In this case, from a string, to multiple lists, to a dataframe.

If there is a method that will allow you to select a file and import it as either a list or a dataframe then this process would be more robust.

As mentioned earlier, using the pandas import process  as a DataFrame is the more robust method and requires you to change the CSV name to a generic load name such as “test.csv” will work fine.

I am currently leaving the export as a generic file name “export.csv” for testing purposes.

Compiling programme

I have had no success with this so far. There are a number of ways to do it with packages 1/ py2exe,2/ pyinstaller &  3/cx_Freeze.

You choose the .py file that you want to compile and install the compiler package and then tra la! Not so.

1/ py2exe seems to be a python 2.7 package and doesn’t play well with python 3 from what I have seen on the web, although I haven’t tried it.

2/pyinstaller. I have had multiple attempts at, but I do not seem to get a working compiled .exe programme. I even tried it with a file test.py which only had print (“hello World”) and I couldn’t even get that up and running!

Here is a short video that demonstrates how its meant to work.

3/cx_Freeze you write a setup.py file to run that calls all the bits you want. I thought I was getting along with that one as it seemed to compile, but alas, not so. I will have to try it with the file test.py which only had print (“hello World”) and see if I can get that one going.

So the compilers are a bit of a disappointment really, not as easy as what I’d thought they’d be.

End Comment

I have been happy with the learning in Python by doing this project. With the help of StackOverflow I have been able to identify the issues and work my way around them for the coding.

A bit disappointed on the compiling side. I will persevere with this though.

I have been able to replicate the process of taking the 3d pdf export csv and convert it to a specific room csv structured in the same manner as the Excel Macro. I have spent far less time on Python doing it than I did with the VBA coding, and the looping in the VBA coding is pretty messy, whereas the Python process is simpler.

There were a few side issues that came up such as the multiple columns and staggering of the data across those columns on larger files, but I think this could be easily managed with merging those columns.

Add a Comment